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Sir: 

1 . I, Cameron J ennings, PhD, declare and say I am a resident of 1 Janice Street, Warners 
Bay, New South Wales 2282, Australia. 



2. I am an employee of Graham A Brown and Associates. My previous scientific 
employment consists of Immune System ITierapeutics Ltd (Sydney), Harvard University 
(Boston) and La Trobe University (Melbourne). Details of my career as well as 
publications may be found in my curriculum vitae (Exhibit A). 

3. I understand that US Patent Application No. 1 0/590,690 (referred to hereafter as the 
'Tatent Application") is assigned to Immune System Therapeutics Limited. I presently 
own options which would allow me to purchase shares in Immu ne. System Therapeutics 
Limited should the company become publicly listed, 

4. I have been asked by FB Rice & Co, Patent Attorneys for Immune System Therapeutics 
Limited, to provide an independent opinion on the state of knowledge surrounding kappa 
and lambda light chains and on the invention described in the Patent Application. I have 
been asked in particular to comment on the obviousness rejections set out in the Office 
Action dated 1 1 September 2009 issued in connection with the Patent Application. 

5. I have read the Patent Application and have reviewed the claims that I understand are 
presently being considered by the United States Patent and Trademark Office. I 
understand that the claimed invention (referred to hereafter as the "Invention") relates to 
methods of treating B-cell disorders by adroduristering an antibody or ligand that 
specifically binds to lambda myeloma antigen (LMA). The importance of this inventioh is 
that it provides, for the first time, LMA as a target that is selective for tumor B-cells j 



I 



expressing LMA. Antibodies and ligands that bind LMA do not target normal B-cells 
which express lambda light chain in the context of intact immunoglobulin. 

6. I have considerable experience in the technical field of the claimed invention. I have over 
3 years of experience in research involving immunoglobulin light chains and membrane- 
bound proteins. Prior to this time, my research consisted of a senior post-doctoral 
fellowship at Harvard University researching membrane bound malaria vaccine candidates 
and a PhD that was co-supervised by Professor Marilyn Anderson (Biochemistry 
Department at La Trobe University) and Professor David Craik (Institute of Molecular 
Bioscience at The University of Queensland). 

7. I have reviewed the Office Action dated 1 1 September 2009 in relation to the above- 
referenced Patent Application. I understand that the Patent Office has taken the position 
that the Invention would be obvious to a person skilled in the art in light of Uhr et al. (US 
7,792,447), Raison et al (WO 03/004056) and Abe et al (1993). 

8. Uhr et al are cited as disclosing antibodies that bind intact immunoglobulin associated 
lambda light chain on tumor cells. Raison et al are cited as disclosing that malignant 
cells express both kappa and lambda light chains, that free kappa light chain is expressed 
on the cell surface of kappa light chain expressing myeloma cells and that antibodies 
which bind free kappa light chain can be used to treat tumors. Abe et al are cited as 
disclosing antibodies that bind free light chain and not intact immunoglobulin. 

9. It is my understanding that the Patent Application and the cited publications are to be 
viewed from the perspective of one of ordinary skill in the art in the relevant field (a 
"Skilled Person") at the time of filing of the Patent Application in question. I have been 
asked to consider this time to be the period around or before 27 February 2004 ("the I 
Relevant Period"). I would expect a Skilled Person in the field of antibody therapy during 
the Relevant Period to have been represented by a scientist with a PhD degree in ' 
Biochemistry and/or at least 3 to 5 years experience in the field of Biochemistry, or an 
educational background at the same degree level in a related field and equivalent level of 
experience. I 

10. I am very familiar with the technical field of the claimed invention. I am qualified to 
analyze literature in this field and to provide ray opinion as to what literature in this field} 
discloses or suggests to me Skilled Person at the Relevant Period. i 
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J.1 - By the Relevant Period I had attained at least the level of such a Skilled Person, and 
further in view of my qualifications discussed above, I believe that I am qualified by I 
training and experience to address what a Skilled Person would have understood from 
reading the Patent Application and the cited publications. 

12. The Examiner's objection appears to be based on the premise that kappa and lambda lig it 
chains are the two known alleles of immunoglobulin light chain and therefore once free 
kappa light chain hadbeen found expressed on the surface of myeloma cells, it would 
have been expected that free lambda light chains would also be expressed on the surface 
of malignant B cells. f 

13. In my view, it is not correct to assume that the expression and localisation of the kappa 
light chain in myeloma cells is in any way predictive of the expression and localisation o E 
lambda light chains in malignant B cells. 

1 4. Immunoglobulins are composed of two chains, light and heavy, which form a functional 
heterodimer. The light chain molecules are of two protein families, "kappa" and 
"lambda". The kappa and Lambda gene loci are located on chromosomes 2 and 22 
respectively and differ in their number of variable (V), joining (J) and constant (C) genes j 
as well as their general arrangement. For example, there is a single C kappa gene (Zimmer 
et al., 1990) while there are at least 4 functional C lambda genes (Dariavach et al., 1987) 
While both kappa and lambda light chains maintain a conserved role when present in 
intact immunoglobulin, no normal biological Junction has been attributed to light chains 
alone. 

15. Although kappa and lambda light chains both comprise a conserved (C) and variable (V 
domain (within the respective families) and they both complete a multimeric 
immunoglobulin, the proteins share minimal sequence identity within their C domains 
(Kabat et al., 1975). Furthermore, the variable domains of the kappa and lambda light 
chains are derived from genetic recombination events and the recombined genes also 
undergo the process of somatic hypermutaion to give rise to antigen binding moieties (raj 
the context of the whole immunoglobulin). As such the V regions vary in their sequence! 
as well. 

1 6. Recent information indicates that although kappa and lambda V domains have a typical ! 
Immunoglobulin fold consisting of two antiparallel p-barrel sheets, they diverge in the 
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processes they use to force the phenomenon known as the f strand switch' (James et ah, 
2007). This phenomenon is dependent on local structure and confirms the variability in 
sequence, structure and biological function between the two immunoglobulin light chains. 

17. The immunoglobulin beta sandwich, referred to above, contains seven beta Strands which 
form a sandwich of two beta sheets. This structural motif is found on a vast array of 
protein sub-families with diverse biological activities and sub-cellular locations (for a 
complete set of proteins in the SCOP data base, see (Murzin et al, 1995; available from 
http://scop.n3rc-lmbxam.ac.uk/scop/daWscop.b.c.b.html). Proteins that fall within this 
structural family include Myelin oligodendrocyte glycoprotein (MOG), the T-cell antigen 
receptor and TREM-1 (triggering receptor expressed on myeloid cells) to name a few. 

1 8. As further evidence of differences in structure of kappa and lambda light chain, the 
Peptostreptococcus magnus protein L binds kappa light chains but does not bind lambda 
light chains (Graille et al., 2001). This again suggests variation in sequence and local 
Structure Of the kappa and lambda light chains 

19- The differences in the structures of the kappa and lambda light chains is further 

highlighted by observations indicating that lambda antibodies are found m two-thirds of 
light chain Amyloidosis cases, whereas kappa light chains mediate greater than 85% of 
Light-Chain Deposition Disease (LCDD). Thus, the light chain composition appears to 
confer different pathologies to kappa or lambda light chains (James et al., 2007). 

20. The difference in the structure of amyloid fibrils (fibrilar) and the LCDD deposits 
(amorphous) also reflects the difference in the general structure of kappa and lambda light 
chains (Khurana et al., 2001). Moreover, lambda light chains exist predominantly as 
dimers while kappa light chains are mostly present as monomers (Bradwell, 2008; 
Solomon and Weiss, 1995). As a consequence, the character and rate of the catabolic 
processes that are involved in the clearance of kappa and lambda light chains are different 
It has been suggested that this phenomenon contri butes to the predominance of lambda ' 
light chains in amyloidosis (Solomon and Weiss, 1995). 

21. There exists three broad categories of surface expressed proteins: i) integrally associated 
proteins possessing hydrophobic surfaces that readily interact with the acyl core of the 
bilayer; ii) membrane proteins that are covaiently attached to certain phospholipids; and ■ 
ffl) peripheral proteins that associate with the membrane through charge-charge 
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electrostatic interactions (Sachs aadEngelman 2006). Examination of the sequence of 
kappa and lambda light chains does not reveal candidate residues suitable for covalent 
attachment to the membrane. Thus, it can be surmised that the membrane interaction of 
either kappa or lambda light chains would occur via hydrophobic and/or electrostatic 



22. Differences in the primary sequence of kappa and lambda light chains affects not only the 
charge of the exposed side chains of the proteins but also their predominant presence as 
either monomers or dimers respectively. Thus, in view of the differences in primary 
sequence of kappa and lambda light chains, which in turn affects the presence/absence of 
exposed hydrophobic surfaces, and the lack of a membrane targeting sequence in these 
proteins, the expression of either of these two proteins on the cell surface could not be 
predicted. Thus, it could not have been predicted by analysis of the structure or sequence 
of kappa or lambda light chains that either of these proteins would associate with the 
membrane of malignant B cells. 

23. In my opinion, therefore, there is nothing in the cited prior art to suggest that a Skilled 
a could have predicted that fee lambda light chain would be associated with the 

e of tumor B-cells. There was no suggestion nor motivation, therefore, around 
February 2004 for a Skilled Person to investigate fee lambda light chains as a potential 
therapeutic target on tumor B-cells. 

24. I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that 
these statements were made with the knowledge that willful false statements and me like 
so made are punishable by fine or imprisonment, or both, under § 1001 of Title XVIH of 
the United States Code, and that such willful false statements may jeopardize the validity 
of the Patent Application or any patent issuing thereon. 

7^ AWa ZO[Q 

Jennings, PhD 
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I he kidney and monoclonal free li»ht chains 



AR Bradwett, P Cockwell and C Hutchison 



13.1. Introduction 

13.2. Normal free light chain clearance and metabolism 

13.3. Nephrotoxicity of monoclonal free light chains 

13.4. Diagnosis of myeloma kidney using sFLC analysis 

13.5. Removal of free light chains by plasma exchange 

13.6. Model of free light chain removal by plasma exchange and 

13. 7. Removal of free light chains by haemodialysis 

13.8. Recovery of renal failure following FLC removal by 



Summary: MonoHonal s. 



e light chains:- 



1. Cause renal impairment in approximately 30% of patients with MM and dialysis 
dependent renal failure in 10%. 

2. Should be measured in all MM patients to identify those at risk of renal damage. 

3. Are not adequately removed by plasma exchange. 

4. Can be removed by haemodialysis using "high cut-off" dialysers, leading to renal 
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13.1. Introduction 

Renal failure is a major cause of morbidity and mortality in patients with MM. At 
initial presentation, up to 50% of patients have renal impairment (serum creatinine 
>1.5mg/dL or >130umol/L); 12 to 20% have acute renal failure (ARF) and 10% become 
dialysis dependent. They represent 2% of the dialysis population and there are 
approximately 5,000 new patients, worldwide, each year. Furthermore, there is a 50% 
mortality within 6 months of diagnosis (Kyle 7, Hutchison 2) 

While reversible factors such as dehydration, hypercalcaemia and medication are 
frequently involved, monoclonal FLCs are the most potent cause of irreversible renal 
failure. Large amounts of sFLCs readily pass through the glomerular fenestrations and 
overwhelm the absorptive capacity of the proximal tubules. On entering the distal 
tubules, they co-precipitate with Tamm-Horstall protein to form waxy casts that both 
block the flow of urine and cause interstitial inflammation (Herrera 1). Furthermore, 
high concentrations of FLCs are directly toxic to tubular cells (Sanders 1,2). 

Studies have analyzed renal recovery rates after FLC removal by plasma exchange. 
This is a logical approach, but results have been disappointing. Although an early report 
was optimistic (Zucchelli), the largest and most recent controlled trial (97 patients) 
showed no clinical benefit (Clark WF). A subsequent editorial in the Journal of the 
American Society of Nephrology (JASN) listed the shortcomings of this study, including 
the failure to monitor either serum or urine FLC concentrations (Ritz). It was noted, 




"This resembles anti-hypertensive treatment without measuring blood pressure. " 
Clearly, the efficiency of plasma exchange for PLC removal could not be judged. 

Because FLCs are relatively small protein molecules (k~25 kDa: dimeric A.-50 kDa) 
they are present in similar concentrations in serum, extravascular compartments and 
tissue oedema fluid (Takagi). Thus, the intravascular compartment may contain only 15 
to 20% of the total amount. A series of 3.5 litre plasma exchanges that removed 65% of 
intravascular FLCs on each occasion would have little overall impact, particularly if 
production were not reduced at the same time by chemotherapy. An alternative approach 
is to remove FLCs by haemodialysis. Although this is not possible with routine dialyzers 
because of their small pore sizes (12-15kDa), a new generation of "high cut-off" 
dialyzers, allows FLC removal (Hutchison 2). By using extended dialysis, large 
amounts of FLCs can be removed without the attendant clotting and deproteination 
problems that may limit the extended use of plasma exchange. 

This chapter discusses the normal renal handling of FLCs, their role in renal failure 
in MM, clinical case studies and current management strategies such as plasma 
exchange. A mathematical model of FLC removal is presented and plasma exchange is 
compared with the utility of haemodialysis. Finally, clinical evidence for the beneficial 
use of "FLC removal haemodialysis" is presented. 

13.2. Normal free light chain clearance and metabolism 

In normal individuals, sFLCs are rapidly cleared by the kidneys depending upon their 
molecular size (Figure 13.1 and Chapter 3). Monomeric FLCs, characteristically k, 
are cleared in 2-4 hours at 40% of the glomerular filtration rate. Dimeric FLCs, typically 
X, are cleared in 3-6 hours at 20% of the glomerular filtration rate, while larger polymers 
are cleared more slowly. Removal is prolonged to 2-3 days in MM patients who are in 
complete renal failure when FLCs are removed by the liver and other tissues. In 
contrast, IgG has a normal serum half-life of 21 days that is not affected by renal 





impairment. 

After filtration by the glomeruli, FLCs enter the proximal tubules and bind to brush- 
border membranes via low-affinity, high-capacity receptors called cubulins and 
raegalins (Johnson RJ). Binding provokes internal isation of the FLCs, subsequent 
proteolysis into smaller peptides and finally their excretion into the urine flow. The 
concentration of FLCs leaving the proximal tubules, therefore, depends upon the amount 
in the glomerular filtrate, competition for binding uptake from other proteins and the 
absorptive capacity of the tubular cells. A reduction in GFR, due to loss of nephrons, 
increases sFLC concentrations so that more is filtered by the remaining functioning 
nephrons. Subsequently, and with increasing renal failure, hyperfiltering glomeruli leak 
albumin and other proteins which compete with FLCs for absorption thereby causing 
more to enter the distal tubules. 

FLCs entering the distal tubule can bind to uromucoid (Tamm-Horsfall protein). This 
is the predominant protein in normal urine and is thought to be important in preventing 
ascending urinary infections. It is a glycoprotein (85kDa) that aggregates into high 
molecular weight polymers of 20-30 units. Interestingly, it contains a short peptide 
motif that has a high affinity for FLCs (Ting). 

13.3. Nephrotoxicity of monoclonal free light chains 

The main pathology in myeloma kidney is cast nephropathy. This is caused by 
precipitation of FLCs with uromucoid as waxy casts and is characteristically found in 
acute renal failure associated with MM (Figures 13.2 and 13.3) (Johnson RJ). The casts 
obstruct tubular fluid flow, leading to disruption of the basement membrane and 
interstitial damage. Rising concentrations of sFLCs are filtered by the remaining 
functioning nephrons which become blocked, leading to a vicious cycle of further 
increases in sFLC concentrations and progressive renal damage. This may explain why 
some MM patients, without apparent pre-existing renal impairment, suddenly develop 




Figure 13.2. Waxy cast from the urine Figure 13.3. Classic casts in the distal 
of a patient with multiple myeloma, tubules of a patient with light chain multiple 
(Courtesy of R Johnson and J Feehally). myeloma. (Courtesy of C Hutchison). 
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Human immunoglobulin C x 6 gene encodes the Kern + Oz~ A. chain 
and C k 4 and C k 5 are pseudogenes 

(isotypes/allotypes/Bence Jones proteins/constant region genes) 
P. Dariavach, G. Lefranc, and M.-P. Lefranc 

Laboratoire d'lmmunogenetique, Centre National de la Recherche Scientifique Unite Associ«e 1191, Genitique Moleculaire, Universit* des Sciences et 
Techniques du Languedoc, Place E. Bataillon, 34060 Montpellier Cedex, France 

Communicated by C. Milstein, August 3, 1987 (received for review May 28, 1987) 



ABSTRACT Six nonallelic immunoglobulin X constant 
region genes have been previously characterized on a 40- 
kilobase stretch of DNA. The nucleotide sequences of the three 
upstream genes of this cluster (C x l, C k 2, C x 3) have been 
determined by other workers and shown to encode, respective- 
ly, the isotypic Meg, Kern~Oz~, and Kern~Oz + constant 
region of the X chains. In this paper, we report the sequence of 
the three downstream genes of this cluster and show that two 
of them (C x 4 and C x 5) are pseudogenes. However, C k 6 encodes 
a Kem + Oz~ chain and corresponds to the fourth isotype 
described among the X proteins sequenced so far. A potentially 
active A (joining) segment, with the canonical heptamer and 
nonamer sequences for rearrangement, is located 1.5 kilobases 
upstream of C x 6~ . The amino acid sequence encoded by the C x <f 
gene is compared with the constant region sequences of various 
monoclonal Bence Jones X proteins. Allotypic and isotypic 
differences confirm the polymorphism and complexity of the 
human C x locus. 



In humans, the constant (C) region of the immunoglobulin X 
light chains consists of at least four nonallelic or isotypic 
forms that differ by limited amino acid substitutions to 
produce the serological markers Kern (Ke) (1, 2), Oz (3-5), 
and Meg (6, 7). Several additional substitutions have been 
described (8-16), but it is unknown whether these represent 
allelic variants or distinct isotypes. The human immunoglob- 
ulin X light chain genes have been mapped to chromosome 22 
(17) at band qll (18, 19), and six nonallelic X C region genes 
{CJ to C x <5) have been characterized on a 40-kilobase (kb) 
stretch of DNA (20). The number of C x genes varies between 
six and nine per haploid genome (21). These variations were 
detected by restriction fragment length polymorphism (21) 
and seem to have arisen from unequal meiotic crossing-over 
with a duplication of the C x 2 and C\3 genes. Moreover, three 
additional C x -like genes have been recently identified, which 
map on different stretches of DNA and are nonallelic (22). 
One of these is a pseudogene, whereas the two others encode 
a putative X chain C region whose sequence differs from that 
of the X chains described so far. 

Only three C x genes (CJ, C x 2, and C K 3) belonging to the 
cluster described by Hieter have been sequenced (20), and 
they have been shown to encode, respectively, the Meg, 
Ke"Oz~, and Ke"Oz + C region of the X chains. In this paper, 
we report the sequences* of the three genes located down- 
stream in this cluster and show that two of them (C\4 and C x 5) 
are pseudogenes, whereas C x 6 encodes a Ke + Oz~ chain, the 
fourth isotype described among the proteins sequenced so 
far. This C x <5 gene has a potentially active 7 X 6 joining region, 
with the canonical heptamer and nonamer sequences for 
rearrangement, 1.5 kb upstream of the coding C region. 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" 
in accordance with 18 U.S.C. §1734 solely to indicate this fact. 



MATERIALS AND METHODS 

Construction of a Phage Library from LY67 DNA. DNA 

prepared from LY67 cells (a X-producing Burkitt's lympho- 
ma) (23) was partially digested with Mbo I. Restriction 
fragments 15-20 kb long were Hgated into BamHI-digested 
DNA of phage X2001 (24) and packaged in vitro. Recombinant 
phages were screened by the in situ plaque hybridization 
procedure (25). 

Probes. A genomic clone (Chr 22X5) in Xgt-XWES (26) was 
kindly provided by T. H. Rabbitts (Medical Research Coun- 
cil, Cambridge, England). This clone contains an 8.0-kb 
ZscoRI fragment that includes the known nonallelic Ke~Oz" 
(C x 2) and Ke"Oz + (C k 3) genes and the flanking sequences 
(20). We subcloned a 700-base-pair (bp) Bgl II-EcoRI frag- 
ment containing only the Ke _ Oz" C k 3 gene (Fig. 1), and this 
C x probe cross-hybridizes with all the other C x -like genes 
(20). It was radioactively labeled with [a- 32 P]dCTP by nick- 
translation (27) and was used to screen the LY67 phage 
library. 

Subcloning and Sequencing Strategies. One clone, LY67 
C x 3-6 (Fig. 1), was shown to contain C X J to C x 6. Appropriate 
subclones were made in pUC vectors (29). Nucleotide se- 
quence analysis was carried out by dideoxy chain-termina- 
tion procedures (30) in M13 vectors (31) by deploying 
exonuclease III-nuclease SI methods (32) or directed se- 
quencing using known restriction enzyme sites. 

Oligonucleotide Synthesis and Hybridization. A 19-mer 
oligonucleotide 5' GTGTTCGGCGGAGGGACCA 3' corre- 
sponding to part of the J K 3 gene segment sequence (this paper 
and ref. 28) was synthesized, radiolabeled, and hybridized to 
the LY67 C x 3-6 clone to search for other A segments. Low- 
stringency washes were carried out at room temperature. 

RESULTS 

Rearrangement of a V x m Subgroup Gene to 7 X 3 ui the LY67 
Cell Line. One clone (LY67 C x 3-6) containing a 18-kb piece 
of genomic DNA was isolated and characterized. A restric- 
tion map of this clone is shown in Fig. 1. Comparison of this 
map with one previously published (20) suggested that this 
clone contains four C x genes, namely C K 3 to C x 6. The 
sequence of the 5' end of the LY67 C x 3-6 clone shows that a 
V x gene rearrangement has occurred, joining this gene to the 
J x 3 gene segment, which is located 1.5 kb upstream of C x 3 
(28). Fig. 2A shows the partial nucleotide sequence of the 
rearranged V x in LY67 and that of a V x gene assigned to the 
V X III subgroup and isolated from the Burkitt lymphoma cell 
line PA682 (28). 



Abbreviations: C, constant; J, joining; V, variable; Ke, Kern. 

•The sequences reported in this paper are being deposited in the 
EMBL/GenBank data base (Bolt, Beranek, and Newman Labora- 
tories, Cambridge, MA, and Eur. Mol. Biol. Lab., Heidelberg) 
[accession nos. J03009 (Cyf), J03010 (C x 5), and J03011 (C x 6)]. 
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FiG. 1. (A) Restriction map of LY67 C x 3-6 clone. (B) Sequencing 
strategy. B, flamHI; Bg, Bgl H; H, //mdlll; R, £coRI; S, Sst I. Of 
the Pst I sites (P), only the one used for subcloning the fragment 
containing the CJS gene is indicated. The rearrangement V-J J is 
indicated by an arrow (V, variable region). An asterisk shows the 
location of a polymorphic BamHI site present in PA682 DN A (28) but 
absent from our LY67 clone. 



The deduced amino acid sequence of the rearranged V gene 
of LY67 is also compared to the V region of the protein DEL 
of the subgroup V X III (33, 34). A 75% sequence identity 
indicates that the V x gene rearranged in LY67 is a member of 
the VJII subgroup gene family, and this is in agreement with 
the detection of a transcript hybridizing to a V X III probe in the 



LY67 cell line (28). The J J segment of the LY67 C x 3-6 clone, 
compared to the J x 3 segment rearranged in PA682, shows two 
nucleotide differences (one of them resulting in a valine/ 
leucine amino acid substitution) that may be due to allelic 
polymorphism. Two other nucleotide differences are ob- 
served at the V-J junction and are probably explained by a 
flexibility in the mechanism by which junctions occur (35, 
36). 

C x tf Encodes a Ke + Oz~ Chain. Fig. 2B shows the nucleotide 
sequence of the C x 6 gene and the encoded amino acid 
sequence (106 residues). The residues Ala, Ser, and Thr, 
found, respectively, at codons 6, 8, and 57 (positions 112, 
114, and 163 according to ref. 34) indicate that C x 6 encodes 
a Meg" protein. Arg (codon 83, position 190) corresponds to 
the Oz~ marker, whereas Gly (codon 46, position 152) 
characterizes the Ke + marker. Therefore the C x 6 gene en- 
codes the fourth isotype Ke + Oz~. 

J x 6 Segment Is 1.5 kb Upstream of C K 6. Only the JJ (22) 
and J k 3 segments (ref. 28 and this paper) have been charac- 
terized; they have been localized in genomic DNAs at 1.5 kb 
upstream of the respective C x coding regions. We therefore 
used an oligonucleotide corresponding to the J k 3 sequence 
(see Materials and Methods) to search for homologous 7 X 
segments in the LY67 C x 3-6 clone. As expected, a strong 
signal was obtained for the 7 x 3-containing fragments, where- 
as a weaker signal allowed us to detect the J k 6 segment in a 
Sac \-Bgl II fragment upstream of C x <5. The sequence of this 
J k 6 segment (Fig. 2C) showed that it encodes 12 amino acids 
(among them the characteristic Phe-Gly-Xaa-Gly residues) 
and that it also possesses the canonical heptamer and 
nonamer sequences essential to V-J rearrangement (37, 38). 
No signal corresponding to the putative J k 4 and 7 X 5 segments 
could be detected in the LY67 C x 3-6 clone by using either the 
oligonucleotides (J x i probe) or the genomic Sac l-Bgl II 
fragment (J K 6 probe), indicating that if these segments exist 
their homology is too weak to be detected in our conditions 
of hybridization. Since the J x 3 and J x 6 gene segments are 1.5 
kb upstream of their respective coding regions, we subcloned 
fragments located, respectively, at about the same distance 
upstream of C K 4 and C x 5. Although in both cases, we 
detected some conserved heptamer sequences, we did not 



PA 682 bi AGC . 



I Ser Glu Ser Ser . 



a Thr Leu Thr lie Asn Arg 



e Gly Gly Gly Thr Lys 
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L^s Ala" A^p lolyl ser Pro Val '. 
Irar) Pro Ser Lys Gin Ser Asn '. 



sn Lys Tyr , 



:er Hls ltogl ser 



T CA CAG TGT K 



e Gly ser Gly Thr Lys ' 



Fig. 2. Nucleotide and amino acid sequences. (A) Partial sequence of the LY67 V^-JJ rearranged gene. (B) Sequence of the C k 6 gene. (C) 
Sequence alignment of / k 6 and J J (22). 
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G GTC ACT CTG TTC CCG C 





find the characteristic Phe-Gly-Xaa-Gly residues or the 
expected splice site at a downstream position. It is possible 
that the heptamer sequences are attached to poorly con- 
served pseudo Jx. segments and we cannot exclude the 
possibility that the putative J x 4 and Jy.5 are localized in 




CXI ETTKPSKQSNNKYAASSYLSLTPEQWKSH&YSCQVTHEGSTVEKTVAPTECS 



Fig. 4. Protein sequences derived from the human C k gene 
sequences. The standard one-letter symbols are used. 



fragments that were not sequenced, upstream of the C K 4 and 
C x 5 genes. 

C k 4 and C x 5 Are Pseudogenes, Nucleotide sequences and 
the encoded amino acid sequences of C\4 and C x 5 are shown 
in Figs. 3 and 4. Both genes are pseudogenes; the third codon 
of C x 4 is a stop codon, and C^4 displays three deletions. The 
first deletion of 9 bp spans codons 5 to 7 and the other two 
deletions excise codons 21 and 64. C k 5 has an 11-bp deletion 
(codons 41-44) resulting in a frameshift. 



Table 1. Amino acid differences between the four nonallelic 
forms of the human C x regions 



Amino acid residue 
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Mcg" Ke~Oz- 


CJ 


Ala 


Ser 


Ser 


Thr 


Arg 


Mcg- Ke"Oz + 


CJ 


Ala 


Ser 


Ser 


Thr 


Lys 


Meg" Ke + Oz" 


C x 6 


Ala 


Ser 


Gly 


Thr 


Arg 


Residue numbei 


•ing is 


according 


to ref. 34; pares 


itheses 


enclose 


numbering of the i 




in the C x 


genes. 
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JXl CXI UX2] CX2 JX3 CX3 i|iCX4 +J? +CX5 JX6 CX6 

I - I - I i - I - 3" 

Meg Ke""Oz~ Ke"Oz* \(/ \p Ke*Oz~ 

Fig. 5. Physical map of the human X light chain C region. CJ, C x 2, and C x 5 correspond, respectively, to the nonallelic Meg, Ke"Oz", and 
Ke-Oz + chains (ref. 20; see Table 1). CJ and C k 5 are pseudogenes (i/i), whereas C x 6 encodes a Ke + Oz" chain. J J (22), J J (ref. 28 and this 
paper), and J x 6 (this paper) have been localized 1.5 kb upstream of CJ, CJ, and C x 6, respectively. 7 X 2 has not yet been localized in genomic 
DNA. No A gene segment has so far been identified upstream of CJ and C x 5. 



DISCUSSION 

In human X chain C regions, the Meg marker involves amino 
acid residues at positions 112, 114, and 163 (numbering 
according to ref. 34; Table 1) corresponding, respectively, to 
codons 6, 8, and 57 of the C x genes. Mcg + proteins have 
residues Asn-112, Thr-114, and Lys-163, whereas Meg" 
proteins have residues Ala, Ser, and Thr, respectively, at 
these locations. However, the recently sequenced Mor pro- 
tein is different from the other Meg" proteins by having 
Ala-163 instead of Thr-163 (16). The Ke and Oz markers 
occur at positions 152 and 190, respectively: Ke + proteins 
have Gly-152 and Ke" have Ser-152; Oz + proteins have 
Lys-190 and Oz", Arg-190. These markers define four nonal- 
lelic forms of human X chain C regions (Meg, Ke"Oz", 
Ke"Oz + , and Ke + Oz"), which are encoded, respectively, by 
the CJ, C x 2, C x 5 (20), and C x 6 (this paper) genes (Fig. 5). 

In Fig. 6, monoclonal Bence Jones proteins have been 
assigned as products of the C x genes 1, 2, 3, or 6 on the basis 
of the presence or absence of residues characteristic for the 
Meg, Ke, and Oz markers. In most cases, there is complete 
concordance between the protein and the deduced amino acid 
sequence of the corresponding C x gene. However, other 
amino acid changes have been found in several proteins 
(Table 2 and Fig. 6). Since these substitutions have been 
noted only once, they could represent allotypic differences. 
However, it is not excluded that some of the proteins 
Ke~Oz~ could be encoded by a C x gene resulting from the 
duplication of the C X 2-C X J region, as has been described in 
some individuals (21). In such cases these sequences should 
represent new isotypic differences due to the presence of 
several nonallelic copies of C x 2 gene. Differences observed 




Table 2. Sequence variations in the C region of human X chains 



Isotype 


Common 


Residue 
number 


Variant 
amino acid 




Ref. 


Meg 


Lys 


156 


Glu 


WEIR 


8 


Ke-Oz" 


Lys 


129 


Glx 


CH 


9 








Arg 


NIG68 


10 








Ser 


ATK 


11 




Ala 


143 


Val 


MZ 


12, 13 




Val 


155 


He 


H1L 


14 




Lys 


156 


Glu 


SA 


11 




Ala 


157 


Val 


WAY 


1 




Lys 


172 




MZ 


12 








Arg 


MOR 


16 




Lys 


187 


Gin 


ATK 


11 




Gin 


195 


Leu 


NIG68 


10 


Ke"Oz + 


Asp 


151 


Glu 


EV 


15 



Residue numbers according to Rabat et al. (34). 



in the J x 2 sequences (Fig. 6) might for the same reason be 
either allotypic or isotypic. Differences in the JJ segment 
region might represent allotypic differences, although the 
presence of not yet identified other J J segments cannot be 
ruled out. 

If we compare the deduced amino acid sequence of C x 6 
gene with two known Ke + Oz" proteins that have identical C x 
coding regions, SM (53) and Kern (52), the protein predicted 
for C x 6 shows three differences: (i) lysine at position 145 
(codon 39) instead of threonine, (ff) asparagine at position 156 
(codon 50) instead of lysine, and (/ff) alanine at position 212 



JLv^oVZ^SKMraMWSSYWLTPEOWKS^VSCOVTHEOSTVEKWAPT^ 



Fig. 6. Sequences of C regions of human X chains. The protein sequences encoded by the four "active" C x genes and associated J x gene 
segments are compared with the X protein C regions. The numbering of the C x region amino acids is according to ref. 34— e.g., positions 169, 
201, and 202 are excluded in the C x sequences for purposes of alignment with human C K chains. For an easier alignment, the i x and C x gene 
segments are considered as being spliced and a vertical line is drawn to indicate the V-J junction. The horizontal line in the SM sequence shows 
a deletion. 
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(codon 103) instead of threonine (Fig. 6). These differences 
may represent allotypic variations, although we cannot 
entirely exclude the possibility that these different Ke + Oz~ 
sequences are encoded by nonallelic genes. More sequences 
of C x genes or X proteins should help estimate the extent of 
the human X chain polymorphism. 

Note Added in Proof. The J x 2 gene segment has recently been 
localized upstream of C*2 (54). 
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Transposition of human immunoglobulin V x genes within 
the same chromosome and the mechanism of their 
amplification 
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The variable, joining and constant gene segments of the 
human immunoglobulin x locus (V„, J„ and C J are 
located on the short arm of chromosome 2 at 2pll -2pl2. 
Here we describe a cluster of 11 V„ genes on the long 
arm of chromosome 2 at 2cen-qll. By pulsed-field gel 
electrophoresis, cosmid cloning and DNA sequencing the 
cluster was shown to consist of four amplified units 
(amplicons). The amplicons, each 110-160 kb in size, 
are organized within 650 kb as an array of inverted 
repeats with short stretches of non-amplified DNA in 
between. Cloning and sequencing of three different joints 
between amplified and non-amplified DNA revealed the 
existence of parts of Alu repeats at each of the analysed 
joints. It is suggested that during evolution a group of 
five \ x genes was transposed from the short to the long 
arm of chromosome 2 by a pericentric inversion. Three 
of the five V„ genes were then amplified in two 
subsequent steps to yield the structure found in the 
majority of the present day population. The possible 
relation of this structure to a pericentric inversion of 
chromosome 2 that is seen cytogenetically in a small 
fraction of today's population is discussed. 
Key words: Alu repeats/amplification/immunoglobulin V, 
genes/transposition 



Introduction 

Amplification and transposition of genes play an important 
role in the formation of multigene families during evolution 
(for review, see Maeda and Smithies, 1986). In the case of 
the human gene family coding for the variable regions of 
the immunoglobulin light chains of the x type (VJ, 
putative transpositions of V„ genes led to the formation of 
a 'mixed' gene cluster, in which genes of different subgroups 
are interdigitated (Pech and Zachau, 1984). A subsequent 
duplication led to the generation of a second copy of a large 
part of the V x gene cluster (for reviews, see Zachau, 1989, 
1990). The x locus contains -70 V x genes (for reviews, 
see Zachau, 1989, 1990) and is located on the short arm 
of chromosome 2, at 2pl2 (Malcolm etai, 1982). In 
addition, V„ genes have been found on the chromosomes 
1 , 22 (Lotscher et al. , 1986, 1988) and other chromosomes 
(Straubinger et al. , 1988). These genes are called orphons 
is analogy to histone and ribosomal RNA genes found outside 
of their respective gene clusters (Childs et al., 1981). 
We have previously reported the structure of two con- 



tiguous cloned regions (contigs), called Wa and Wb, with 
a total of nine V v genes (Pohlenz et al. , 1987). The two W 
contigs have been assigned to chromosome 2 (Lotscher 
et al. , 1988) and were thought to be part of the x locus. 
However, we did not succeed in linking them to the cloned 
parts of the x locus by chromosomal walking or by pulsed- 
field gel electrophoresis (PFG). In the course of the PFG 
experiments, a third W contig was detected. The three 
contigs are present in the genomes of all individuals so far 
analysed. The observation that the W contigs are located on 
chromosome 2, yet are not part of the x locus, prompted 
us to analyse their genomic organization and chromosomal 
location in more detail. 

Results 

Characterization of the W contigs 

We previously described the characterization of the two 
contigs Wa and Wb (Pohlenz et al. , 1987). Here, we report 
the isolation and characterization of two sets of cosmid 
clones, one extending the contig Wb and one representing 
a new W contig, termed Wc. 

In the course of chromosomal walking experiments, the 
cosmid libraries III (Pohlenz et al., 1987) and V (Lorenz, 
1989) were screened with the W-specific clone m654-l 
(Pohlenz et al. , 1987); 25 cosmid clones were isolated. Five 
of them were derived from Wa without extending the known 
contig. The other 20 clones belonged to Wb with some of 
them extending the contig. Two additional V„ genes were 
found in the extending cosmids. A map of the W contigs 
with some representative cosmid clones is shown in Figure 1. 
A description of all clones can be found in Zimmer (1989). 

The existence of a third contig, Wc, was demonstrated 
on PFG blots that had been hybridized with the Wa-derived 
probe ml67-l (Pohlenz et al., 1987; see also Figure 1). In 
subsequent restriction mapping of cosmid clones that had 
been isolated previously with ml 67-1, but not fully 
characterized (Pohlenz, 1986), three clones matching neither 
Wa nor Wb were identified. Wc was shown to be a third 
independent contig and not an allelic variant of Wa or Wb 
by demonstrating the existence of Wa-, Wb- and Wc- 
characteristic fragments in the genomes of all 20 unrelated 
individuals tested (Zimmer, 1989; data not shown here). The 
map of Wc is included in Figure 1 . 

We use the term amplified unit or amplicon for those parts 
of the contigs Wa, Wb and Wc, that are homologous to each 
other. As Wb contains two amplified units, a minimum of 
four W amplicons (named I -IV; Figure 1) exist within the 
human genome. 

How many V x gene containing W amplicons exist 
in the human genome? 

To test whether more than three V v gene-containing 
amplicons exist, we estimated their copy number relative 
to C Y , which is known to be single copy, by quantitation 
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Fig. 1. Restriction maps of the genomic contigs Wa, Wb and Wc. The maps were derived from cosmid clones described in Pohlenz et al. (1987) 
and Zimmer (1989). For each contig only some representative cosmid clones are shown. The maps are aligned to demonstrate the homology between 
the contigs. Positions of restriction sites that are identical in two contigs or occur in contigs without counterpart are marked by vertical bars. 
Differences in restriction sites between two contigs are symbolized by arrows pointing to the respective map positions. In the map of Wc, restriction 
sites are given separately for the part that shows no homology to Wb or Wa. The scale (kb) applies to all contigs. A deletion of 1 kb in Wb at map 
position 92 kb is marked by a triangle. V, genes are drawn as filled boxes; the subgroup designations are indicated (I — III). Plasmid and M13 
subclones are shown as horizontal bars above the restriction map of Wa and underneath those of Wb and Wc. The subclones m654-l, ml65-5 and 
m 167-1 are described in Pohlenz et al. (1987), the remaining ones in Zimmer (1989). Amplified units are marked by fat lines and are numbered 
(I -IV). Clones derived from one amplicon hybridize to the corresponding position within other amplicons; these positions are marked by dotted 
lines. Arrows at the maps of Wa and Wb indicate the transcriptional orientation of the V, genes. A dot marks a Nru\ site at map position 72 kb, 
which is present only in some alleles of Wa. 



of Southern blot hybridizations. We chose a clone which 
hybridizes to a position close to one of the amplified V„ 
genes (m654-l) and linked it to a fragment derived from 
the C x region (1-1). The construct m654-l/I-l is shown in 
Figure 2a and the blot hybridizations are in Figure 2b. The 
copy number of the W contigs is estimated as the multiple 
of the C x signal (Table I). From the calculated W/C, ratios 
it is very likely that no further amplicons hybridizing with 
m645-l exist. The procedure of copy number determination 
is similar to the one of Meindl (1990) who estimated the 
overall number of V x genes including W-type V x genes in 
the DNA of individuum AF, and arrived at data compatible 
with three copies of the W amplicon and one copy of the 
non-amplified gene W6. The cosmid clones constituting the 
W contigs of Figure 1 are derived from the DNA of 
individuum N and individuum St. Since the blot hybridiza- 
tions of restriction nuclease digests of the DNAs AF, N, 
St and PC-3 (see below) show no significant differences in 
the regions of the W-type V x genes (Meindl, 1990) it is 
likely that the number of V x gene-containing W amplicons 
is three. The amplified unit within Wc is not detected by 
hybridization with m654-l/I-l since it does not reach into 
the V x gene-containing regions of the W amplicons 
(Figure 1). To identify other possibly existing truncated 



amplicons and to elucidate the genomic organization of the 
known amplicons, we established a long-range map of all 
regions hybridizing with W-specific probes by PFG. 



Long-range map of the W regions 

The analysis of the organization of the W region was carried 
out with DNA from the prostate carcinoma cell line PC-3 
(Kaighn et al. , 1979), as PC-3 DNA seems to contain more 
unmethylated restriction sites than DNA from other sources 
(Lorenz etal., 1987), allowing the use of methylation 
sensitive restriction nucleases for long-range mapping. 

A remarkable feature of the PFG blots is that all W region 
probes tested hybridize only to two Notl fragments, 250 and 
600 kb in size. V„ gene probes hybridize mainly with two 
additional Notl fragments which contain the x locus (Lorenz 
et al. , 1987). The identification of Notl sites in cosmid clones 
of Wa and Wc, which are also present in PC-3 DNA (Notl 
sites at map positions 550 and 1150 kb of Figure 3), has 
been useful in the construction of the long-range map. 

The PFG studies did not reveal any evidence for the 
existence of additional W regions. According to the PFG 
experiments, the four W amplicons are organized as an array 
of large inverted repeats. 
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Fig. 2. Determination of the number of W contigs. In (a) the construct 
m654-l/I-l, which was used for the hybridizations, is shown. A 
1.1 kb EcoKl-Sacl fragment from pl-l, which is unique for the C„ 
region (Klobeck et al. , 1984), has been cloned into the W-specific 
probe m654-l. Southern blot analyses with m654-l/I-l are shown in 
(b). Each lane contains 10 #ig of placenta DNA digested with the 
indicated restriction nucleases. The W and C, derived fragments are 
assigned to the respective bands, the sizes of which are given in kb. 
For the quantitative evaluation see Table I. 



Cloning and sequence analyses of joints between 
amplified and non-amplified DNA 

Four joints between amplified and non-amplified DNA can 
be identified within the W regions by comparing the 
respective restriction maps (transitions between bars and lines 
in Figure 1). By cloning joint-containing fragments into 
plasmid (pl65-3, p654-3, p654^) or M13 vectors (ml77-l, 
ml 68-1) and comparing the fine maps, fragments suitable 
for the sequence analyses of the joints were identified and 
subcloned. 

The break-off in homology between Wa and Wb was 
localized by aligning the map of Wa with that of the 5' 
amplified unit of Wb (amplicon III) and comparing the 
restriction maps of pl65-3 and p654-3. The joints of the 
amplicons U and M in Wb were localized in a similar manner 
by comparing the maps of p654-3 and p654-4. To localize 
the Wc joint, the maps of ml68-l and ml77-l were aligned 
(Figure 1). The sequences of the analysed joints are shown 
in Figure 4. It is evident from these data that Alu repeats 



Table I. Determination of the number of V, gene-containing W 
amplicons 



Restriction 
digests 


Size of ban 
(kb) 


ds a Region 3 


Counts/ min b 


Copy number 1 


BcmHl 


10.0 




14±10 


1.0 




8.0 


Wa+Wb 


47± 9 


3.5 (3-4) 


B g m 


9.5 


c, 


21 ±10 


1.0 




6.6 


Wa+Wb 


58± 9 


2.8 (3) 


BamHVBglW 


7.0 


c, 


43± 7 


1.0 


Xhol 


5.3 


Wb 


87± 7 


2.0 (2) 




3.5 


Wa 


32± 8 


0.7 (1) 


BglWSphl 


4.8 


c» 


40± 7 


1.0 




2.5 


Wb 


80± 7 


2.0 (2) 




1.5 


Wa 


35± 8 


0.9 (1) 



Figure 2b. § § 

•The values are corrected for blank values as described in Materials 

and methods. The standard deviations are indicated. 

The copy number for is taken as 1.0. The W/C, ratios are 

calculated by dividing the c.p.m. of the W containing bands by the 

one for C„; the resulting numbers of W copies are given in 

parentheses. 

played a role in the recombination processes, which led to 
the formation of the novel joints. 

The W contigs reside on the long ami of 
chromosome 2 

As we have not been able to link by chromosomal walking 
or PFG the W contigs to the x locus, we tried to determine 
the chromosomal location of W by analysing somatic cell 
hybrids. 

In one set of experiments, mouse -human hybrid cell 
lines, which contain only parts of human chromosome 2, 
were analysed with probes specific for the x locus or the 
W regions. In one of the lines, JI4-2L (Eriksson et al. , 1983), 
only that part of the short arm of chromosome 2 is present 
that comprises C x to the telomere (2pl2-tel). DNA of this 
cell line hybridizes only with a C,, -specific probe but with 
neither probes derived from the V* gene-containing parts 
of the locus nor the W region-specific probe m654-l 
(Zimmer, 1989). 

A second analysed cell line, RRP5-3 (Shiloh et al, 1985), 
represents the almost complementary situation to JI4-2L, as 
RRP5-3 contains a 2p~ chromosome, i.e. the long arm and 
only that part of the short arm between the centromere and 
the x locus in 2pl2. DNA of this cell line does not hybridize 
with any of the single-copy probes specific for the x locus; 
however, it gives a strong signal with m654-l (Zimmer, 
1989). According to these data, the W contigs are located 
either on the short arm of chromosome 2, between the x 
locus and the centromere, or on the long arm. 

A more precise localization of the W regions was achieved 
by in situ hybridizations; these results are shown in Figure 5. 
The x locus specific probe pC-2 (Klobeck et al. , 1984) maps 
to the short arm of chromosome 2 in the region 2pl 1 — 2pl2 
in accordance with the location of the x locus (Malcolm 
et al. , 1982; McBride et al. , 1982). The W-derived cosmid 
clones cosl78, cosl65 (Figure 1) and cosl40 (Pohlenz et al. , 
1987) as well as the subclone m654-l (data not shown) map 
to a region close to the centromere on the long arm of 
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Fig. 3. Long range map of the W contigs. The map was constructed on the basis of numerous PFG blots, most of which were hybridized 
consecutively with different probes in order to find out whether certain fragments are recognized by more than one probe [a list of fragments 
hybridizing with W-specific probes can be found in Zimmer (1989)]. The shown orientation of Wb is arbitrary, as there is no cloned restriction site 
within Wb which would allow one to determine its orientation. Wa and Wc were oriented by identification of restriction sites for Ato/I and Nrul in 
cosmids of the Wa contig, and of a Notl site in cosmids of the Wc contig. These sites are also present in DNA of PC-3 which is used in the PFG 
experiments (map positions 500, 550 and 1150 kb). The distance between the Wa and Wc contigs is defined by the 600 kb Atoll fragment which 
hybridized with ml71-3 and ml77-l. The contig Wb is placed in the centre of this fragment since we assume about equal sizes of the amplicons (see 
Discussion). The 500 and 700 kb Nrul fragments are defined by hydridization with m654-l, ml71-3 and ml77-l. The amplicons I-IV are shown as 
black bars. Arrows indicate the amplicon orientation based on the transcriptional orientation of V, genes (I -III); the orientation of amplicon IV is 
based on its restriction map which is homologous to that of amplicon I (Figure 1). Filled and open triangles mark the regions to which the indicated 
probes hybridize. Open symbols indicate that the hybridizing region is not represented by cosmid clones. The terminal part of the map without 
known restriction sites is shortened (-//-). The analysed cell line PC-3 (Kaighn etal., 1979) is heterozygous for the Nrul site marked by a 
rhombus. The linking of Wa to Wb and Wb to Wc is based on those restriction fragments marked with larger letters [for details see Zimmer 
(1989)]. 



chromosome 2 (2cen-qll). This position is obviously 
distinct from the x locus. 



Discussion 

Size, copy number and organization of the W region 
amplicons 

According to the PFG experiments and the copy number 
determinations, four amplified units hybridizing with W 
region-specific probes reside within a DNA stretch of 650 kb 
and are organized as an array of inverted repeats. The cloned 
parts of the amplicons I-IV are 50- 100 kb each (Figure 1). 
If the length of non-amplified DNA between the amplicons 
I and II and between HI and IV (Figure 3) is similar to that 
between the amplicons II and m, the average size of the 
amplicons is in the range 110-160 kb. The chromosomal 
organization of the W amplicons, i.e. variable amplicon size, 
head-to-head or tail-to-tail orientation of the amplified units 
and stretches of non-amplified DNA in between two 
amplicons, is very similar to the organization of, for 
example, the amplified genes of dihydrofolate reductase 
(DHFR), CAD, c-myc and adenylate deaminase (AMPD) 
(DHFR, Ma et al. , 1988; Heartlein and Latt, 1989; CAD, 
Ardeshir et al. , 1983; Ford and Fried, 1986; c-myc, Ford 
and Fried, 1986; AMPD, Hyrien etal., 1988). 

One major difference between the organization of the W 
regions and the amplified units reported in the literature is 
the low copy number and a rather small amplicon size. The 
amplicons described so far exist either in a low copy number, 
which is the case after first step selection, but have a size 
up to 10 000 kb, found for CAD amplicons (Giulotto et al. , 
1986) or the size is in the range of that of the W region 
amplicons, but the copy number reaches a value up to 2600 



copies per cell, as has been described for AMPD amplicons 
(Yeung et al. , 1983). Whether these differences reflect a 
different mechanism of amplification responsible for the 
generation of the W regions is discussed in a following 
section. 



Novel joints are formed within Alu repeats 

Relatively few studies define the DNA sequences at the sites 
of recombination associated with amplification (for review, 
see Stark et al. , 1989). The sequence analyses of novel joints 
revealed the existence of partial Alu repeats. This finding 
makes it very likely that the repeats played a role either 
during the amplification process itself or in recombinations, 
which are necessary to resolve aberrant replication bubbles 
into a linear array of amplified units. Depending on the 
assumed mechanism of amplification (for review, see Stark 
et al., 1989), one can imagine several ways in which Alu 
repeats participate in the amplification process. In the context 
of strand switch models (Nalbantoglu and Meuth, 1986; 
Hyrien etal., 1988), Alu repeats could serve as those 
sequences which promote the strand switch by the DNA 
polymerase within the replication fork. According to such 
a model, one would expect to find partial Alu repeats at the 
amplification joints if the strand switch event did not occur 
at identical positions on the leading and lagging strand. 

In the course of recombination, which is always associated 
with DNA amplification, Alu repeats might serve as target 
sites for the recombinations. The involvement of Alu repeats 
in genomic recombinations has been reported frequently (see 
literature cited in Hyrien et al. , 1987). 

Although we cannot decide at which step during the 
generation of the W amplicons the Alu repeats played a role, 
these repetitive sequence elements seemed to be important 
for the amplification of a W region precursor. 
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s- 2 wusw . . . 
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TGAACCQ3GGAGGTGGAGGTTGCAGTGAGCCGAGATCGCGC(^TGCACTCCAGCCTGGKGACAGAGCGAGACTCCGTCTC 
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( IV)* TGCCAGGTGTGTGAGGOTGGGCCMTCTKGGCACCAC 

( I ) GCAACTTCAGCAAAGTCTCAAGTTAWAMTTAATGTGCAAAAATTGTTATAKA^ 



ConAlu 



Fig. 4. Sequences of fragments, spanning joints between amplified and non-ampiified DNA. The sequences are aligned to show maximal homology. 
Nucleotide positions which are identical between two sequences are marked by vertical bars. The regions from which the sequences are derived are 
indicated. ConAlu is a consensus sequence of human Alu repeats (Kariya el at. . 1987). The break-off in homology between two sequences is marked 
by a filled triangle. Those sequences which are believed to be the result of the recombinations associated with the generation of respective amplicons 
('recombined sequences') are marked by asterisks. The non-marked sequences have to be considered as "reference sequences', which allow detection 
of the nucleotide position in the recombined sequence at which amplified and non-amplified DNA is joined. The sequences including some extensions 
that are not shown here are being transmitted to the EMBL Data Library. The sequencing strategies are described in Zimmer (1989). Both strands 
were sequenced in all cases. The sequence Wa in (a) is derived from a subcloned fragment of pl65-3. spanning the joint between amplicon I and 
non-amplified DNA; the 'reference sequence' Wb is derived from a subcloned fragment of p654-3 (Figure 1). In (b) the sequence 5' Wb is derived 
from a fragment that contains the border of amplicon HI; it was subcloned from p654-3 (Figure 1; Zimmer, 1989). The reference sequence, 3' Wb, 
is derived from a fragment that contains the border of amplicon II; the fragment was subcloned from p654-4 (Figure 1). The sequence Wc in (c) is 
derived from a subcloned fragment of ml77-l which contains the break off in homology to ml68-l (Figure 1). The boxed block of 56 nucleotides in 
Wa shows 90% sequence identity to the consensus sequence of human Alu repeats; the 5' part of the complementary strand is shown. The Alu 
" ~ " etai. 



The W regions: products of two successive 
amplifications 

The results of sequence comparisons of genes derived from 
amplicons I— III make it very likely that these three 
amplicons were generated at different times and, hence, in 



a multistep process during evolution (Zimmer et al. , 1990). 
According to these data, the amplicons U and HI, which form 
the large inverted repeat within Wb (Figure 1), were formed 
later in evolution than amplicon I. This conclusion is also 
supported by the results of restriction map comparisons of 
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Fig. 5. Localization of the W contigs by in situ hybridization. In (a) 
the gram distribution on chromosome 2 is shown; on the left side for 
the W-derived cosmid clone cos 165 and on the right side for the C, 
probe pC-2. For the probes cosl78, cosl40 and m654-l the same 
localization was found as shown for cosl65. In (b) arranged pairs of 
chromosome 2 are shown after hybridization with the indicated probes. 
Cosmid probes were detected by a non-radioactive procedure, pC-2 
and m654-l after radioactive labelling. 



amplicons I -HI, which show that of 29 sites mapped within 
a stretch of 40 kb only three sites differ between amplicons 
II and HI. It is very likely that the three differences were 
created by a single event only, which led to a deletion of 
1 kb in amplicon II (Figure 1). More differences are found 
between amplicons II and I (10 out of 33 sites differ) and 
between m and I (seven out of 33 sites differ). The 
evolutionary relationship of amplicon IV to the other three 
amplicons is hard to prove with this strategy, as those parts 
of amplicons II and III that are homologous to amplicon IV 
have not yet been cloned (Figure 1). 

A chain of events which could have led to the generation 
of the W regions during evolution is shown in Figure 6. One 
assumption of the scheme is that the W region precursor had 
a structure similar to that part of Wb which contains the genes 
W5 to W9. This is reasonable to assume, as the five genes 
have the same transcriptional orientation (Figure 1), which 
also holds true for most genes of the x locus, from where 
the W precursor must be derived. In a first step, a section 
of ~ 150 kb with the genes W7, W8 and W9 is duplicated 
in such a way that a large palindrome is formed. The newly 
generated genes 7', 8' and 9' represent the genes Wl, W2 
and W3 of Wa. In a second step, which led to the duplication 
of a large block of DNA of ~ 300 kb, a large palindrome 
is again formed. A third copy of the genes 7, 8 and 9 (7", 
8" and 9") is generated; these copies represent W4, W10 
and Wl 1 of Wb. Amplicon IV is a copy of the 3' part of 
amplicon I, which has been formed in the first step. 




Fig. 6. Model of the generation of the W regions in a multistep 
process. The precursor contained five genes, marked by filled dots. 
Arrows indicate transcriptional orientations of genes. The section 
which becomes duplicated in the course of the first amplification event 
and the generated amplicons are drawn as hatched arrows with the 
arrow-heads pointing to the 3' end with respect to the orientation of 
the genes. The brackets indicate that this structure does not exist in the 
genome anymore. That part of the intermediate product which is 
duplicated in a second event and the resulting amplicons are marked 
by dotted arrows. The four generated amplicons are termed I-IV in 
accordance with Figures 1 and 3. Genes generated in the first step are 
marked by ', those of the second step by ". The gene numbers of the 
present day W regions are given in parentheses. Parts of Alu repeats 
that have been detected at the joints are indicated. At the amplicon I 
joint, the 5' part of an Alu repeat is part of the amplicon (hatched 
rectangle), the deleted 3' part is marked by an open rectangle with a 
dotted line. At the joint of amplicon III, the 3' part of an Alu repeat 
is present (open rectangle); it is part of the non-amplified DNA 5' of 
amplicon III. At the amplicon IV joint only a few nucleotides of an 
Alu repeat are present (Figure 4c). The deleted 5', as well as 3', 
sequences are marked by open rectangles with dotted lines. The lower 
panel shows a long-range map of the W regions, similar to that shown 
in Figure 3. The open parts of the arrows, which indicate amplicons 
I-IV, represent those amplified DNA stretches that have not yet been 



A remarkable feature of this series of amplifications is that 
in the first, as well as in the second, step only one additional 
copy is formed. This is in contrast to the generation of other 
amplified units, where the first step of selection yields only 
a few additional copies; second step selections, and, hence, 
secondary amplifications, however, usually result in high 
copy numbers (Saito etal, 1989). Alternatively, the low 
W copy numbers can be explained by secondary deletions 
of most of the W copies formed in the amplification events. 
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Transposition and amplification of the W regions 

For the generation of the W regions we propose the following 
chain of events: (i) the W region precursor, containing the 
genes W5 to W9 (Figure 6), was transposed to the long arm 
of chromosome 2 and (ii) the new chromosomal location of 
the precursor somehow promoted its stepwise amplification. 

A pericentric inversion, involving chromosomal bands 2ql 
and 2pl, could have been responsible for the transposition 
of the five V x genes from the x locus on the short arm to 
the long arm of the chromosome. Such a chromosomal 
rearrangement is observed at the evolutionary progenitor of 
the chimpanzee (Yunis and Prakash, 1982). Transposition 
of V„ genes by such a process is consistent with a finding 
by Graninger et al. (1988), who detected another part of the 
x locus, a copy of the x-deleting element (xde; Siminovitch 
et al., 1985), on the long arm of chromosome 2 in 2qll 
(Graninger et al. , 1988). As 2ql 1 is the same band in which 
the W regions are located, one can speculate that the two 
regions, the W precursor and xde, were transposed to the 
long arm by the same process. Whereas a copy of the xde 
is still located 23 kb 3' of C„ (Klobeck and Zachau, 1986), 
we do not have any evidence that a copy of the W regions 
exists within the present day x locus. 

As an alternative to the transposition by a pericentric 
inversion, the V„ genes could have been transposed via 
episomes. Such a mechanism has been proposed to be 
responsible for the amplification of DHFR and mdrl 
amplicons in some cell lines (Carroll et al. , 1988; Ruiz and 
Wahl, 1988). While an episome mediated mechanism seems 
likely for the orphon V„ genes on chromosomes 1 , 22 and 
other chromosomes, we prefer for the W regions the idea 
of a pericentric inversion. 

Data supporting the second assumption, i.e. the new 
chromosomal location promoted gene amplification, have 
been reported by Wahl etal, (1984), who found an 
influence of the chromosomal position of transfected CAD 
genes on the frequency of CAD amplification. 

For the amplification event itself, resulting in the formation 
of palindromic structures, mechanisms such as those recendy 
reviewed by Stark etal. (1989) may be responsible. 
However, to explain the generation of only one additional 
copy per amplification step, one has to postulate that no 
replication processes yielding high copy numbers took place 
in the course of the W region amplification. 

Concluding remarks 

Pericentric inversions of chromosome 2 involving the 
chromosomal segments 2pl 1 — 2ql3 are observed by 
cytogenetic methods in about 0.1% of today's population 
(Djalali et al., 1986 and earlier literature). Also, de novo 
inversions have been observed at chromosome 2 (Vejerslev 
and Friedrich, 1984), indicating that the inversion is a 
spontaneous event. We consider the possibility that extensive 
sequence homologies between parts of the x locus on 2pl2 
and the x locus derived W and xde regions on 2qcen-ql 1 
promote the inversions. Interstitial telomere-like repeats, 
found in 2qll-2ql4 (Allshire etal., 1988), might also 
contribute to the spontaneously occurring pericentric 
inversions observed in the present day human population. 

One of the structural features of the W regions, i.e. the 
organization of different copies as large palindromes, is also 
found for the x locus itself (Lorenz et al. , 1987). It is 
tempting to speculate that a duplication involving large parts 



of the x locus followed the same molecular mechanisms as 
the amplification of the transposed W precursor. Proving 
this hypothesis should be feasible by identifying and 
analysing the junctions between duplicated and non- 
duplicated sections, and by cloning the head-to-head junction 
of the two copies forming the x locus. 

Materials and methods 

Recombinant DNA and restriction maps 

The recombinant cosmids were isolated from libraries described by Pohlenz 
et al. (1987) and Lorenz (1989). Colony hybridization was performed as 
described previously (Klobeck et al., 1987). Restriction maps and subclones 
were constructed using established methods (Maniatis et al., 1982). Cosmid 
clones were characterized as described in Pohlenz et al. (1987). 

Nucleic acid hybridization and copy number determination 

DNA transfer was performed according to Reed and Mann (1985), except 
for PFG, where the protocol of Rigaud et al (1987) was used. Final washing 
of filters after hybridization was at 68°C with 40 mM phosphate, pH 7.2, 
I % sodium dodecylsulphate. 

For copy number determinations of amplicons, the insert of clone 
m654-l/I-l (Figure 2a) was isolated as a 2.2 kb EcoKl-BamHl fragment 
on an agarose gel and labelled according to Feinberg and Vogelstein (1983). 
Digested genomic DNA (10 /tg) was electrophoresed, transferred to filters 
and hybridized. After exposition for 1 day, bands were cut out. For blank 
values, filter strips above and beneath each band were used. The filter bound 
activities were measured in a liquid scintillation counter. 

PFG 

Long-range mapping of DNA from the prostate carcinoma cell line PC-3 
(Kaighn et al., 1979) using rare cutting enzymes and orthogonal field gel 
electrophoresis was performed as described previously (Lorenz et al., 1987). 

DNA sequencing 

For sequencing of M 1 3-subclones the 'Sequenase' DNA sequencing kit (US 
Biochemical Corp., Cleveland, OH) was used according to the 
manufacturer's instructions. 

Chromosome banding and in situ hybridization 
Metaphase chromosomes were prepared from PHA-stimulated blood 
lymphocytes. The probes m654-l and pC-2 were radiolabelled with 
pH]dTTP and [ 3 H]dCTP, and hybridization was done as described in 
Adolphera/. (1987). A non-radioactive detection method was used for the 
localization of cosmid probes. The cosmid clones were linearized and labelled 
by the random priming technique with biotinylated dUTP-11. Repetitive 
DNA sequences within the genomic insert were saturated by prehybridiza- 
tion with a 100-fold excess of human Cot-1 DNA for 4 h (Landegent et al. , 
1987). The normal in situ hybridization was performed after prehybridization. 

For probe detection the slides must remain humid. They were rinsed in 
BT buffer (0.1 M sodium bicarbonate, 0.1% Triton-X-100 pH 8.0) and 
preincubated for 5 min in BT, 0. 1 % non-fat dry milk. The slides were treated 
with peroxidase labelled streptavidin (Enzo Biochemicals Inc., New York; 
20 /tl per 450 pi BT) and probe detection was done with diaminobenzidine 
according to the protocol supplied by the manufacturer. The slides were 
counter-stained with methylene green. 



Acknowledgements 

F. -J.Zimmer was the holder of an A.Butenandt fellowship. We thank Drs 

G. Bruns and J.Erikson for providing DNA of the cell lines RRP5-3 and 
JI4-2L, respectively, and B.Bauriedel for expert assistance. The work was 
supported by Bundesministerium fur Forschung und Technologie (Center 
Grant O3I62O0A2) and Fonds der Chemischen Industrie. 



References 

Adolph.S., Bartram.C.R. and Hameister.H. (1987) Cytogenet. Cell Genet.. 
44, 65 - 68. 

Allshire, R.C.. Gosden.J.R., Cross.S.H., Cranston.G., Rout.D., 
Sugawara,N., SzostakJ.W.. Fantes.P.A. and Hastie.N.D. (1988) Nature, 
322, 656 -659. 

Ardeshir.F., Giulotto.E., Zieg.J., Brison.O., Liao.W.S.L. and Stark.G.R. 



1541 



F.-J.Zimmer et al. 



(1983) Mol. Cell. Biol., 3, 2076-2088. 
Carroll.S.M., DeRose.M.L., Gaudray.P., Moore.C.M., Needham- 

Vandevanter.D.R., Von Hoff.D.D. and Wahl.G.M. (1988) Mol. Cell. 

Biol, 8, 1525-1533. 
Childs.G., Maxson,R., Cohn.R.H. and Kedes.L. (1981) Cell, 23, 651-663. 
Djalali.M, Steinbach.P., Bullerdiek.J., Holmes-Siedle.M., Verschraegen- 

Spae.M.R. and Smith,A. (1986) Hum. Genet., 11, 32-36. 
Erikson.J., Nishikura.K., Ar-Rushdi,A., Finan.J., Emanuel.B., Lenoir.G., 

Nowell.P.C. and Croce.C.M. (1983) Proc. Natl. Acad. Set. USA. 80, 

7581-7585. 

Feinberg.A.P. and Vogelstein.B. (1983) Anal. Biochem., 132, 6-13. 

Ford.M. and Fried.M. (1986) Cell, 45, 425 -430. 

Giulotto.E., Saito.l. and Stark.G.R. (1986) EMBOJ., 5, 2115-2121. 

Graninger.W.B., Goldman,P.L., Morton.C.C, O'Brien, S.J. and 
Korsmeyer.S.J. (1988)7. Exp. Med., 167, 488 - 501. 

Heartlein and Latt (1989) Nucleic Acids Res., 17, 1697-1716. 

Hyrien.O., Debatisse,M., Buttin.G. and de Saint Vincent, B.R. (1987) 
EMBOJ., 6, 2401-2408. 

Hyrien.O., Debatisse.M., Buttin.G. and de Saint Vincent.B.R. (1988) EM- 
BOJ., 7, 407 -417. 

Kariya.Y.. Kato.K., Hayashizaki, Y. , Himena.S., Tarui.S. and Matsubara.K. 
(1987) Gene. 53, 1-10. 

Kaighn.M.E., Narayan.K.S., Ohnuki.Y., Lechner.J.F. and Jones.L.W. 
(1979) Invest. Urol., 17, 16-23. 

Klobeck,H.-G. and Zachau.H.G. (1986) Nucleic Acids Res., 14, 
4591-4603. 

KJobeck.H.-G., Combriato.G. and Zachau.H.G. (1984) Nucleic Acids Res., 
12, 6995 - 7006. 

Klobeck,H.-G., Zimmer,F.-J„ Combriato.G. and Zachau.H.G. (1987) 

Nucleic Acids Res.. 15, 9655-9665. 
LandegenU.E., Jansen in de Wal.N., Dirks.R.W., Baas.F. and van der 

Ploeg.M. (1987) Hum. Genet., 77, 366-370. 
Lotscher.E., Grzeschik,K.-H„ Bauer.H.G., Pohlenz,H.-D., Straubinger.B. 

and Zachau.H.G. (1986) Nature, 320, 456-458. 
Lotscher.E., Zimmer,F.-J., KJopstock.T. , Grzeschik,K.-H., Jaenichen.R.. 

Straubinger.B. and Zachau.H.G. (1988) Gene, 69, 215-223. 
Lorenz.W. (1989) PhD Thesis, Fakultat fur Biologie der Universitiit 

Munchen. 

Lorenz.W., Straubinger.B. and Zachau.H.G. (1987) Nucleic Acids Res., 
15, 9667 -9676. 

Ma,L., Lconey.J.E., Leu,T.-H. and Hamlin.J.L. (1988) Mol. Cell. Biol., 
8, 2316 -2327. 

Maeda.N. and Smithies.O. (1986) Annu. Rev. Genet., 20, 81-108. 
Malcolm.S., Barton, P.. Murphy.C, Ferguson-Smith.M.A., Bentley.D.L. 

and Rabbitts.T.H. (1982) Proc. Natl. Acad. Sci. USA, 79. 4957 - 4961. 
Maniatis.T., Fritsch.E.F. and Sambrook.J. (1982) Molecular Cloning: A 

Laboratory Manual. Cold Spring Harbor Laboratory Press. Cold Spring 

Harbor, NY. 

McBride.O.W., Hieter,P.A., Hollies.G.F., Swan.D., Otey.M.C. and 

Leder,P. (1982) J. Exp. Med., 155, 1480-1490. 
Meindl.A. (1990) PhD Thesis, Fakultat fur Biologie der Universilat 

Munchen. 

Nalbantoglu,J. and Meuth.M. (1986) Nucleic Acids Res. . 14, 8361-8371. 
Pech.M. and Zachau.H.G. (1984) Nucleic Acids Res., 12, 9229 -9236. 
Pohlenz,H.-D. (1986) PhD Thesis, Fakultat fur Chemie und Pharmazie. 



Pohlenz,H.-D., Straubinger.B., Thiebe.R., Pech.M., Zimmer,F.-J. and 

Zachau.H.G. (1987)/. Mol. Biol., 193, 241-253. 
Reed.K.C. and Mann.D.A. (1985) Nucleic Acids Res., 13, 7207 - 7221. 
Rigaud.G., Grange,! - , and Pitet.R. (1987) Nucleic Acids Res., 15, 857. 
Ruiz,J.C. and Wahl.G.M. (1988) Mol. Cell. Biol., 8, 4302 -4313. 
Saito.l., Groves.R., Giulotto.E., Rolfe.M. and Stark.G.R. (1989) Mol. Cell. 

Biol., 9, 2445-2452. 
Shiloh.Y., Shipley.J., Brodeur.G.M, Bruns.G., Korf.B., Donlan.T.. 

Schreck.R.R., Seeger.R., Sakai.K. and Latt.S.A. (1985) Proc. Natl. 

Acad. Sci. USA, 82, 3761 -3765. 
Siminovitch.K.A., Bakhshi.A., Goldman.P. and Korsmeyer.S.J. (1985) 

Nature. 316, 260-262. 
SkowronskiJ., Fanning.T.G. and Singer.M.E. (1988) Mol. Cell. Biol.,», 

1385-1397. 

Stark.G.R., Debatisse.M.. Giulotto.E. and Wahl.G.M. (1989) Cell, 57. 
901-908. 

Straubinger.B., Thiebe.R.. Pech.M. and Zachau.H.G. (1988) Gene, 69, 
209 -214. 

Vejerslev.L.O. and Friedrich.U. (1984) Prenat. Diagn., 4, 181-186. 
Wahl.G.M., de Saint-Vincent.B.R. and Rose.M.L. (1984) Nature, 307. 
516-520. 



Yeung,C.-Y., Frayne.E.G., Al-Ubaidi,M.R., Hook.A 
Wright.D.A. and Kellems.R.E. (1983) J. Bi, 
15179-15185. 

Yunis,J.J. and Prakash.O. (1982) Science, 215. 1525-1530. 
Zachau.H.G. (1990) Biol. Chem. Hoppe-Seyler, 371, 1-6. 
Zachau.H.G. (1989) In Honjo.T., Alt.F.W. and Rabbitts.T. (eds). The 

Immunoglobulin Genes. Academic Press, London, pp. 91-109. 
Zimmer,F.-J. (1989) PhD Thesis, Faktiltat fur Chemie und Pharmazie, 



Received on December 27, 1989; revised on February 2, 1990 



1542 



JMB— MS 422 Cust. Ref. No. CAM 502/94 



[SGML] 

J. Mol. Biol. (1995) 247, 536-540 



JMB 

COMMUNICATION 

SCOP: A Structural Classification of Proteins Database 
for the Investigation of Sequences and Structures 

Alexey G. Murzin, Steven E. Brenner, Tim Hubbard and Cyrus Chothia* 



MRC Laboratory of Molecular 
Biology and Cambridge 
Centre for Protein 
Engineering, Hills Road 
Cambridge CB2 2QH 
England 



Corresponding author 



To facilitate understanding of, and access to, the information available for 
protein structures, we have constructed the Structural Classification of 
Proteins (scop) database. This database provides a detailed and com- 
prehensive description of the structural and evolutionary relationships of 
the proteins of known structure. It also provides for each entry links to 
co-ordinates, images of the structure, interactive viewers, sequence data and 
literature references. Two search facilities are available. The homology search 
permits users to enter a sequence and obtain a list of any structures to which 
it has significant levels of sequence similarity The key word search finds, for 
a word entered by the user, matches from both the text of the scop database 
and the headers of Brookhaven Protein Databank structure files. The 
database is freely accessible on World Wide Web (WWW) with an entry point 
to URL http://scop.mrc-lmb.cam.ac.uk/scop/ 

scop: an old English poet or minstrel (Oxford English Dictionary); 
aeon: pile, accumulation (Russian Dictionary). 
Keywords: protein families; superfamilies; folds; evolutionary 
relationships 



Nearly all proteins have structural similarities 
with other proteins and, in many cases, share a 
common evolutionary origin. The knowledge of 
these relationships makes important contributions to 
molecular biology and to other related areas of 
science. It is central to our understanding of the 
structure and evolution of proteins. It will play an 
important role in the interpretation of the sequences 
produced by the genome projects and, therefore, in 
understanding the evolution of development. 

The recent exponential growth in the number of 
proteins whose structures have been determined by 
X-ray crystallography and NMR spectroscopy 
means that there is now a large and rapidly growing 
corpus of information available. At present (January 
1995) the Brookhaven Protein Databank (PDB, 
(Abola er al., 1987)) contains 3091 entries and the 
number is increasing by about 100 a month. To 
facilitate the understanding of, and access to, this 
information, we have constructed the Structural 
Classification of Proteins (scop) database. This 
database provides a detailed and comprehensive 
description of the structural and evolutionary 
relationships of proteins whose three-dimensional 
structures have been determined. It includes all 



Abbreviations used: PDB, Protein Databank; scop, 
Structural Classification of Proteins. 



proteins in the current version of the PDB and 
almost all proteins for which structures have been 
published but whose co-ordinates are not available 
from the PDB. 

The classification of protein structures in the 
database is based on evolutionary relationships and 
on the principles that govern their three-dimensional 
structure. Early work on protein structures showed 
that there are striking regularities in the ways in 
which secondary structures are assembled (Levitt 
& Chothia, 1976; Chothia et al., 1977) and in the 
topologies of the polypeptide chains (Richardson, 
1976, 1977; Sternberg & Thornton, 1976). These 
regularities arise from the intrinsic physical and 
chemical properties of proteins (Chothia, 1984; 
Finkelstein & Ptitsyn, 1987) and provide the basis for 
the classification of protein folds (Levitt & Chothia, 
1976; Richardson, 1981). This early work has been 
taken further in more recent papers; see, for example, 
Holm & Sander (1993), Orengo er al. (1993), 
Overington era/. (1993) and Yee & Dill (1993). An 
extensive bibliography of papers on the classification 
and the determinants of protein folds is given in scop. 

The method used to construct the protein 
classification in scop is essentially the visual 
inspection and comparison of structures though 
various automatic tools are used to make the task 
manageable and help provide generality Given the 
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Figure 1. In scop, the unit of classification is usually the 
protein domain. Small proteins, and most of those of 
medium size, have a single domain and are, therefore, 
treated as a whole. The domains in large proteins are 
usually classified individually The protein entries in the 
December 1994 of the Brookhaven Protein Databank (PDB) 
contain 3179 domains. Many of these become forms of the 
same protein whose differences are not significant in terms 
of the classification used here; for example they have 
different bound ligands or engineered mutations. To 
distinguish between these and structures of the same 
protein from different organisms, proteins listed within a 
family are subclassified by species. Classification of the 
3179 domains show that they come from 498 families that 
can be clustered into 366 superfamilies and 279 different 
folds. In addition to these, scop contains entries for 195 
proteins that do not have atomic co-ordinates available 
from the PDB at present but for which description of their 
structures have been published. 



current limitations of purely automatic procedures, 
we believe this approach produces the most 
accurate and useful results. The unit of classifi- 
cation is usually the protein domain. Small 
proteins, and most of those of medium size, have 
a single domain and are, therefore, treated as a 
whole. The domains in large proteins are usually 
classified individually 

The classification is on hierarchical levels that 
embody the evolutionary and structural relation- 
ships. 

FAMILY. Proteins are clustered together into 
families on the basis of one of two criteria that imply 
their having a common evolutionary origin: first, all 
proteins that have residue identities of 30% and 
greater; second, proteins with lower sequence 



identities but whose functions and structures are 
very similar; for example, globins with sequence 
identities of 15%. 

SUPERFAMILY. Families, whose proteins have 
low sequence identities but whose structures and, in 
many cases, functional features suggest that a 
common evolutionary origin is probable, are placed 
together in superfamilies; for example, actin, the 
ATPase domain of the heat-shock protein and 
hexokinase (Flaherty et al., 1991). 

COMMON FOLD. Superfamilies and families are 
defined as having a common fold if their proteins 
have same major secondary structures in same 
arrangement with the same topological connections. 
In scop we give for each fold short descriptions of its 
main structural features. Different proteins with the 
same fold usually have peripheral elements of 
secondary structure and turn regions that differ in 
size and conformation and, in the more divergent 
cases, these differing regions may form half or more 
of each structure. For proteins placed together in the 
same fold category the structural similarities 
probably arise from the physics and chemistry of 
proteins favouring certain packing arrangements and 
chain topologies (see above). There may however, 
be cases where a common evolutionary origin is 
obscured by the extent of the divergence in sequence, 
structure and function. In these cases, it is possible 
that the discovery of new structures, with folds 
between those of the previously known structures, 
will make clear their common evolutionary relation- 
ship. 

CLASS. For convenience of users, the different 
folds have been grouped into classes. Most of the 
folds are assigned to one of the five structural classes 
on the basis of the secondary structures of which 
they composed: (1) all alpha (for proteins whose 
structure is essentially formed by oc-helices), (2) all 
beta (for those whose structure is essentially formed 
by B-sheets), (3) alpha and beta (for proteins with 
oc-helices and p-strands that are largely inter- 
spersed), (4) alpha plus beta (for those in which 
cc-helices and S-strands are largely segregated) and 
(5) multi-domain (for those with domains of different 
fold and for which no homologues are known at 
present). Note that we do not use Greek characters 
in scop because they are not accessible to all world 
wide web viewers. More unusual proteins, pep- 
tides and the PDB entries for designed proteins, 



Table 1 



Facilities and datab 


uses to which SCOP has links 




Link 




URL 


Reference 


Co-ordinates 


PDB 


http://www.pdb.bnl.gov/ 


(Abola er al., 1987) 


Static images 


SP3D 


http://expasyhcuge.ch/ 


(Appel era/., 1994) 






gopher:// pdb.pdb.bnl.gov/ 


On-the-fly images 


NIH molecular 


http://www.nih.gov/www94/molrus 


(FitzGerald, 1994) 




modelling group 






Sequences and 


NCBI Entrez 


http://www.ncbi.nlm.nih.gov/ 


(Benson et a!., 1993) 


MEDLINE entries 







The scop database contains links to a number of other facilities and databases in the world. Several interactive viewers 
can be linked with scop using PDB co-ordinates. The location and nature of the links will vary as databases evolve and 
relocate. 
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Figure 2. A typical scop session is shown on a unix workstation. A scop page, of the Interleukin 8-like family is displayed 
by the WWW browser program (NCSA Mosaic) (Schatz & Hardin, 1994) . Navigating through the tree structure is accomplished 
by selecting any underlined entry by clicking on buttons (at the top of each page) and by keyword searching (at the bottom 
of each page). The static image comparing two proteins in this family was downloaded by clicking on the icon indicated 
and is displayed by image-viewer program xv. By clicking on one of the green icons, commands were sent to a molecular 
viewer program {RasMoT) written by Roger Sayle (Sayle, 1994) , instructing it to automatically display the relevant PDB file 
and colour the domain in question by secondary structure. Since sending large PDB files over the network can be slow, 
this feature of scop can be configured to use local copies of PDB files if they are available. Equivalent WWW browsers, 
image-display programs and molecular viewers are also available free for Windows-PC and Macintosh platforms. 
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theoretical models, nucleic acids and carbohydrates, 
have been assigned to other classes. 

The number of entries, families, superfamilies and 
common folds in the current version of scop are 
shown in Figure 1. The exact position of boundaries 
between family superfamily and fold are, to some 
degree, subjective. However, because all proteins 
that could conceivably belong to a family or 
superfamily are clustered together in the encom- 
passing fold category some users may wish to 
concentrate on this part of the database. 

In addition to the information on structural and 
evolutionary relationships, each entry (for which 
co-ordinates are available) has links to images of the 
structure, interactive molecular viewers, the atomic 
co-ordinates, sequence data and homologues and 
MEDLINE abstracts (see Table 1). 

Two search facilities are available in scop. The 
homology search permits users to enter a sequence 
and obtain a list of any structures to which it has 
significant levels of sequence similarity The key 
word search finds, for a word entered by the user, 
matches from both the text of the scop database and 
the headers of Brookhaven Protein Databank 
structure files. 

To provide easy and broad access, we have made 
the scop database available as a set of tighdy coupled 
hypertext pages on the world wide web (WWW). 
This allows it to be accessed by any machine on the 
internet (including Macintoshes, PCs and work- 
stations) using free WWW reader programs, such as 
Mosaic (Schatz & Hardin, 1994). Once such a 
program has been started, it is necessary only to 
"open" URL: 

http://scop.mrc-lmb.cam.ac.uk/scop/ 

to obtain the "home" page level of the database. 

In Figure 2 we show a typical page from the 
database. Each page has buttons to go back to the 
top-level home page, to send electronic mail to the 
authors, and to retrieve a detailed help page. 
Navigating through the tree structure is simple; 
selecting any entry retrieves the appropriate page. In 
addition, buttons make it possible to move within the 
hierarchy in other manners, such as "upwards" to 
obtain broader levels of classification. 

The scop database was originally created as a 
tool for understanding protein evolution through 
sequence-structure relationships and determining if 
new sequences and new structures are related to 
previously known protein structures. On a more 
general level, the highest levels of classification 
provide an overview of the diversity of protein 
structures now known and would be appropriate 
both for researchers and students. The specific lower 
levels should be helpful for comparing individual 
structures with their evolutionary and structurally 
related counterparts. In addition, we have also found 
that the search capabilities with easy access to data 
and images make scop a powerful general-purpose 
interface to the PDB. 

As new structures are released by PDB and 
published, they will be entered in scop and revised 



versions of the database will be made available on 
WWW. Moreover, as our formal understanding of 
relationships between structure, sequence function 
and evolution grows, it will be embodied in 
additional facilities in the database. 
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ABSTRACT A comparison of five constant region se- 
quences of human and mouse k and X immunoglobulin 
chains has been undertaken in order to reveal sequence 
homologies and evolutionary relationships. Simultaneously, a 
comparison with the three-dimensional structure of one 
mouse /(-chain (McPC 603) has suggested structural reasons 
why many of the residues are invariant or conserved along k 
versus X lines. There are a number of residues that have re- 
mained invariant despite exposed positions for reasons that 
do not appear to be connected with the folding of this Cl do- 



The constant region (Cl domain) of immunoglobulin light 
(L) chains contributes substantially to the functioning of an 
immunoglobulin molecule. While it is not involved directly 
in the specificity and complementarity of the antibody com- 
bining sites, it is joined to its counterpart in the heavy (H) 
chain, the ChI domain, by a variety of noncovalent interac- 
tions as well as in most immunoglobulins by a disulfide bond 
at its C-terminal or subterminal Cys. This -S-S- bond is not 
essential to L-H association, which is maintained noncov- 
alently even after reduction and alkylation. In one immuno- 
globulin subclass of IgA2, the L-H bond does not occur, but 
an L-L dimer is formed, which remains noncovalently 
linked to the H chains. Bence Jones proteins in the form of 
L-L dimers also occur and may be held together by noncov- 
alent forces (1). 

There are two subclasses of light chains, k and X, which 
are present in almost all species examined; both are found in 
all five classes of immunoglobulins, IgG, IgM, IgA, IgD, and 
IgE, but each immunoglobulin molecule contains two iden- 
tical k or two identical X chains. 

The availability of the sequences of the Cl domain of 
human k, human X, mouse k, and two mouse X light chains 
(2), together with x-ray data on the three-dimensional struc- 
ture of this domain (3-5) made it desirable to evaluate, if 
possible, structural influences on the evolution of these do- 

The present study attempts, residue by residue among 
these five chains, to relate preservation or variation of se- 
quence to structure and function as evaluated from a three- 
dimensional model of the Cl and its interactions with the 
ChI domain. The findings show some interesting stretches 
of sequence in which invariance predominates, and others in 
which evolutionary divergence has been essentially along k 



Abbreviations: C and V, constant and variable regions of immuno- 
globulin chains; L and H, light and heavy chains of immunoglobu- 



and X lines. Positions at the surface of protein molecules that 
are accessible to solvent may undergo many mutational 
changes which do not affect three-dimensional folding, 
while residues that are buried tend to be invariant or highly 
conserved (6). In Cl domains mutations involving residues 
that contact the ChI domain also tend to be restricted. 

The present study shows that residues preserved along k 
and X lines generally show conservative substitutions if in 
the interior of the domain or if buried. Fewer exposed resi- 
dues which have diverged along k and X lines are homolo- 
gous. Certain residues remain invariant despite an essential- 
ly exposed position and for no obvious reason. Of special in- 
terest is the observation that at only two and four positions, 
respectively, were human k identical with human X and 
mouse k identical with mouse X, while human k and mouse k 
were identical at 29 and human X and mouse X at 39 posi- 
tions. 

MATERIALS AND METHODS 

The model of the Fab fragment of mouse McPC 603 con- 
structed from x-ray data at 3.1 A resolution was used (5) as 
well as published information on the Cl regions of a Bence 
Jones dimer (4) and human Fab fragment (3). Sequences of 
human k, human X, mouse k, and two mouse X chains were 
available (2). These sequences were aligned for maximum 
structural and sequence homology from residues 101 to 215 
and modified to include the additional data reported (7). 
Each residue was located in the model of McPC 603 and 
classified according to whether it was: exposed, 0; mainly ex- 
posed, 1; partly exposed, partly buried, 2; mainly buried, 3; 
completely buried, 4; or in contact with ChI, C. In addition, 
each position was classified as: invariant; four of five chains 
and three of five chains identical; human k and human X 
identical; mouse k and mouse X identical; human and mouse 
k identical; human and mouse X identical; and human k, 
human X, mouse k and mouse X different. 

RESULTS 

Table 1 lists the sequences of the five chains from positions 
101 to 215. Above each residue is its classification from its 
position in the model. 

Fig. 1 summarizes the sequence data in Table 1 with re- 
spect to the identities specified above. It is evident that clus- 
ters of invariant residues and those with 4/5 chains identical 
occur, notably at positions 118 to 123, 148 to 152 (excluding 
150), 176 to 182 (180 3/5 identical), and 194 to 200 (exclud- 
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Table 1. Sequences of the switch and constant regions of human and mouse immunoglobulin light chains and their location 
from the model of mouse k myeloma protein McPC 603 constructed from x-ray data 




Sequence data are from Gaily (2) with the mouse « chain data revised according to Svasti and Milstein (7) . Residues at 169 to 177 have been 
realigned to remove the gap in all X chains at 178 and replace it by a gap at 169. Residues 201 and 202 have been moved to 203 and 204, leav- 
ing a gap at 201 and 202. 



ing 195). There are also several clusters in which k versus X 
diversification predominates, 128 to 132 (130 invariant) and 
especially 164 to 172. 

Of the 114 residues considered, 28 were invariant, 17 and 
8, respectively, showed 4/5 and 3/5 chains identical. Of the 
kinds of identities of two chains in the remaining positions 
human k and human X were identical at only two positions 
and mouse k and mouse X at only four positions, while 
human and mouse k were the same at 29 positions and 
human X and one or both mouse X chains at 39 positions. At 
two positions a human k and a mouse X chain were the same 
(*) and at three positions human X and mouse k chains had 
the same residue (•). At but eight positions were all four 
chains different. 



Table 2 summarizes the sequence data in Fig. 1 in relation 
to the location of each residue in the model. The most strik- 
ing finding is that of the eight positions at which the four 
chains had different amino acids (Fig. 1), six residues were 
completely exposed to solvent and the remaining two were 
mainly exposed. Moreover, of the two positions at which 
human k and human X were the same (one of which was the 
Oz marker at position 190) and of the four positions at 
which mouse k and mouse X were identical, all but one were 
completely or mainly exposed. The sixth, residue 135, was a 
contacting residue adjacent to invariant Cys 134. Three of 
the five positions in which human and mouse identities oc- 
curred but which were not both k or both X as indicated by 
the symbols (* and •) in Fig. 1 were completely exposed; 



.rrrmru HI * 


1 1 


i i l i ; ill 111 


— HI 




1 1 HI 


1 1 


IDENTICAL \ 


1 


i 1 


IDCNTICL W 


1 1 H 14 M 


mm* ♦ * i n 



km » kmk ik m k h mmm 1 1 unit i 



Fig. 1. Distribution of identical residues in the switch and constant regions of human and mouse immunoglobulin light chains. The ar- 
rows indicate identical residues in two or more of the five chains as well as at positions at which the four classes of chains differed. When 
more than one arrow occurs at a given position, there was identity among two sets of chains. Thus, at position 169 human and mouse k chains 
had Lys while human X and both mouse X chains had a gap. Arrows with an * and a • indicate the few residues identical in human k and 
mouse X and human X and mouse x, respectively. A dashed arrow indicates an Asx or Glx at that position in one or more chains. 
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Table 2. Location from x-ray structure and degree of evolutionary preservation of residues in the Cl domain and switch, 
region of human and mouse light chains 



Identities — number of residues 



Human k, Human* 



Location 


Code 


Invariant 


4/5 


Human k Mouse k 
3/5 Human X. Mouse X 


Human k 
Mouse k 


Human A. Mouse k, Mouse X. 
Mouse X different 
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these included the Inv marker 191. Position 152, which in 
human X chains carries the Kern marker, was also exposed 
(3). 

Of the 28 invariant residues 18 were contact residues or 
were mainly or completely buried, while nine were mainly 
or completely exposed. Considering together the 25 positions 
in which 4/5 and 3/5 residues were identical, 12 were com- 
pletely or mainly exposed, only one was a contact residue, 
while five were completely buried. 

Of the 29 positions at which human k and mouse k had the 
same amino acid and the six (footnote Table 2) at which 
human X and one of the mouse X chains had the same amino 
acid, seven in each were contacting residues and three and 
six, respectively, were completely buried while 15 and 20, 
respectively, were completely or mainly exposed. 

Of the seven invariant exposed residues six were charged, 
2 Glu, 1 Asp, and 3 Lys; the seventh was Cys 214 which 
forms the -S-S- bond to the H-chain or in some cases to an- 
other L-chain. The location of these six residues and the two 
mainly exposed residues His and Thr in the model provides 
no insight into the basis for their invariance. 

The remaining 19 invariant residues, those partly or com- 
pletely buried and the contacting residues, included 12 
strongly hydrophobic residues, 1 Trp, 2 Phe, 4 Pro, 3 Leu, 
and 2 Val; the remaining residues were Cys 134 and Cys 
194, which form the domain S-S bond, two less strongly hy- 
drophobic residues Ala and Thr, and three Sef . 

Examining the residues with 4/5 and 3/5 identities, the 
partly, completely buried, and contacting residues were also 
overwhelmingly hydrophobic, consisting of 2 Tyr, 5 Val, 1 
Thr, and 1 Ala, the others being 1 Gly and 1 His and the gap 
it position 201. The nonidentical residues at these positions 
were also hydrophobic in almost all cases, involving replace- 
ments of 2 He, 2 Leu, and Ala for the five Val residues and 
of Phe for one Tyr; the remaining Tyr 192 was replaced by 
Ser in both mouse X chains, the Thr by Ser or His, the Ala by 



Asx, and the Gly by Thr. There is obviously ah extraordinary 
preservation of structure in terms of these groups of resi- 

Among the 15 and 16 residues that have diverged along k 
versus X lines and which are partly, mainly, and completely 
buried or are contacting residues, eight are identical pairs; 
three of these involve conservative hydrophobic substitu- 
tions, Ile-Leu, Val-Leu, and Leii-Ile, at positions 117, 132, 
and 136, respectively, involving mainly and completely bur- 
ied residues; the remaining pairs are the substitutions Thr- 
Lys at 172, Glh-Glu 124, Ser-Thr 131, Glu-Ser 165, and 
Thr-Tyr 178, the last four being contacting residues, The 
presence in k chains of an additional residue at 169 causes 
Thr 172 to be completely buried in the mouse C« domain; in 
the X chains, Lys 172 is probably completely exposed to sol- 
vent. The unpaired residues are 1 Phe, 1 Tyr, 2 Ser, 1 Asp, 
and 1 Thr in the <t, and 1 Ala, 1 Val, 1 Gin, 1 Glu, 1 Lys, and 
3 Thr in the X group. 

The k versus X differences show a predominance of com- 
pletely and mainly exposed residues with 14 of 29 and 20 of 
36 residues in k and X, respectively, falling into this group; 

The region 101-108 consists of two invariant residues, 101 
and 102, followed by six positions in which residues with 4/5 
identities alternate with residues which have evolved along k 
and X lines, including the gap at position 108; four of the po- 
sitions are completely exposed, one is partly exposed, and 
two are completely buried. Arg 107 marks the end of the 
mouse V, domain; C, starts with Ala 109. The additional res- 
idue at position 108 in X chains could be accommodated by 
a hairpin bend facilitated by Gly 107 or by Pro 109. 

There is a cluster of invariant residues from 118 to 123 
consisting of three contacting residues, 118, 119, and 121, 
one partly buried 120, and two exposed residues, 121 Ser 
(4/5) with an Asp alternative, and an invariant Glu 123. The 
region 124-140 contains some invariants but is largely made 
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up of residues which have evolved along k and X lines; resi- 
dues 130-137 consist exclusively of contacting and com- 
pletely buried residues of which three are invariant, in an- 
other three k and X differ, and in the others more variation 
has occurred. At 135 human k and X have Leu while mouse k 
has Phe and mouse X Thr and at position 137 human and 
mouse <c have Asn, human X Ser, and mouse X Thr. 

The most striking region which has been preserved along 
k versus X lines, 160 to 175, has five contacting, one com- 
pletely and four partly buried residues, and two mainly and 
three completely exposed residues. It is followed by a largely 
invariant cluster; 176 to 181, of contacting and buried resi- 
dues. The remainder of the molecule is mainly exposed ex- 
cept for the region around Cys 194, in which buried and 
mainly exposed residues alternate and there is no clustering 
of invariant or k versus X residues. 



DISCUSSION 

The general structural relationships described for the Cl re- 
gion of human and mouse k and X chains are an initial at- 
tempt to understand the basis for the evolutionary preserva- 
tion of certain regions as essentially invariant and the preser- 
vation of others along k versus X lines, an evolutionary diver- 
gence which took place about 200 million years ago (8). The 
principles established from sequence and structural findings 
on other proteins generally apply equally well to the immu- 
noglobulins. Thus the buried residues and those contacting 
the heavy chain tend to be largely invariant and hydropho- 
bic, while residues that are exposed or mainly exposed may 
vary, and. are generally polar. These also include the few 
residues that have evolved along human versus mouse lines. 
There are, however, substantial numbers of hydrophobic 
residues that are invariant or identical in 4/5 or 3/5 chains 
and that occur in regions of the molecule for which no ob- 
vious structural basis for their conservation may be assigned. 

The marked clustering of residues which have been pre- 
served along k versus X lines in certain portions of the chain, 
most notably at positions 160 to 175, or have been main- 
tained invariant, such as 118 to 123, suggests that these may 
have unique functions. 

The Cl domain has been extraordinarily preserved once k 
versus X diversification occurred, since at only eight posi- 



tions were all four chains different and at only five other po- 
sitions were human and mouse chains identical despite a k 
versus X difference (Fig. 1). 

The immunoglobulin findings were compared with values 
for human and mouse hemoglobins, using the a chain as 
equivalent to <c and the /3 (7) chain equivalent to X (8). Since 
hemoglobin chains are longer, the data were normalized to 
115 residues. The findings for hemoglobin are strikingly dif- 
ferent from the immunoglobulin results. Thirty-six residues 
were invariant, and 4/5 and 3/5 chains were identical at 12 
and at 49 positions, respectively. Human and mouse a and 
human and mouse /8 were identical at 13 and 11 positions, 
respectively. At only one position each were mouse a and 
mouse /?, human a and mouse /S, and human /3 and mouse a 
identical. There were no positions at which all four chains 
differ. Thus there were 97 residues in which three or more 
chains were identical in hemoglobins in contrast to 53 such 
residues in the immunoglobulins. This comparison indicates 
a higher degree of evolutionary conservation of residues that 
did not differentiate along a versus lines than residues that 
differentiated along k versus X lines. The immunoglobulin 
Cl domain thus shows a higher degree of evolutionary 
adaptability than do the hemoglobins. 
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Antibodies are the archetypal molecules of the Ig-fold superfamily. Their 
highly conserved (3 -sheet architecture has evolved to avoid aggregation by 
protecting edge strands. However, the crystal structure of a human Vk 
domain described here, reveals an exposed p-edge strand which mediates 
assembly of a helical pentadecameric oligomer. This edge strand is highly 
conserved in Vk domains but is both shortened and capped by the use of 
two sequential frans-proline residues in VK domains. We suggest that the 
exposure of this ji-edge in Vk domains may explain why light-chain 
deposition disease is mediated predominantly by k antibodies. 

© 2007 Published by Elsevier Ltd. 
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Antibody immunoglob ulin variable domains are 
amongst the most intensively studied of [S-protein 
structures. They are comprised of two chains, light 
and heavy, which form a functional heterodimer. 
Each chain in turn is comprised of several domains, 
the N-terminal variable (V) domain of which is 
responsible for targeting to pathogens and is the 
focus of somatic hypermutation. Human antibodies 
utilise two alternative light-chain sequences, sub- 
types k and X, which appear to be functionally 
identical and are highly structurally homologous. 
Nevertheless, there are pathological differences. X 
antibodies are found, in two-thirds of light-chain 



Abbreviation used: CDR, complementarity-determirig 

E-mail addresses of the corresponding authors: 
kj@mrc-lmb.c-. aac.uk; wi -r Smrc-lmb.cam.ac.uk 



amyloidosis (AL) cases 1 whereas k antibodies 
mediate >85% of light-chain deposition disease. 2,3 
Furthermore, k antibodies can be targeted by the 
bacterial pathogen Peptostreptococcus magnns super- 
antigen, Protein L, whereas X cannot. 4 

We wondered whether these differences were due 
to structural differences between the isotypes. 
Superposition of the X-ray crystallographic struc- 
tures of the k and X domains reveals that they are 
highly similar and broadly duplicate the same 
topological features. Each variable domain has a 
typical immunoglobulin fold consisting of two 
antiparallel p -sheets closely packed to produce a 
flattened p-barrel (Figures 1 and 2). As in other (S- 
sheet containing proteins, antibodies employ "nega- 
tive design" features to protect edge strands and 
prevent aggregation of the native state. 5 hi antibody 
light-chains, one end of the p> -barrel is formed by 
variable loop C.DR2, which twists in an S shape to 
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V K main-chain hydrogen bonds 




Figure 1. 2-D view of the Vk domain secondary 
structure. The domain has been dividi 1 alon; th fc I i 
and opened out with the "outer" sheet on the left and 
"inner" sheet on the right. Main-chain hydrogen bonds are 
marked according to their conservation amongst Vk 
sequences. 



cap both strand C on the five-strand inner sheet and 
strand D on the three-strand outer sheet. At the 
other end of the barrel, N-terminal strand A 
employs a "strand switch" device to protect the 
two edge strands from each sheet (B on the outer- 
sheet and G on the inner; Figure 1). 

It is in the design of this strand switch that k and X 
clearly diverge. Vk domains force a strand switch 
through an absolutely conserved ris-proline at 
position 8 (Figures 1 and 2), leaving the post-kink 
strand structurally exposed at residues 9-12. Indeed 
this region is able to mediate intermolecular p- 
zipper interactions with the P. trmgnus protein, 
Protein L (PpL). By contrast VK chains combine a 
frajs-proline at residue 8 with an additional trcms- 
proline at position 9, whereby the post-kink strand is 
shortened to three residues and capped through the 
planar ring of proline 9. 

During studies on Vk domains, gel filtration of 
VD9 6 revealed multiple oligomeric species, and it 
spontaneously aggregated at concentrations above 
4 mg/ml. The aggregates were visibly amorphous 
but when examined under the electron microscope 
resolved as straight non-branching fibres 7!(±0.3) 
nm in width and 60-180 nm long (Supplementary 
Data, Figure 1). Whilst these fibres superficially 
resemble amyloid, Congo red and serum amyloid P 
component binding assays revealed that amyloid 
was not present in appreciable amounts (Supple- 
mentary Data, Figure 2 and Supplementary Data, 
Table 1). When concentrated to between 3-4 mg/ml 
and allowed to concentrate slowly by vapour 
diffusion against a wide range of different buffers 
VD9 precipitated in regular hexarneric crystals. We 




Figure 2. Stereo view of a Vk domain. Secondary structure representation of Vk VD9 showing CDR loops and p-edge 
strand A. View is rotated 90° with respect to Figure 1. 
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used these crystals to determine the molecular 
structure of VD9 (Table 1). This revealed a highly 
oligomeric structure that contrasted with all of the 
hundreds of antibody structures solved to date, 
where a simple dimer is almost always observed. 

The asymmetric unit or smallest repeating oligo- 
mer of VD9 consists of 15 molecules assembled into 
a close-packed left-handed helix (Figure 3(a)), and 
these 15-molecule helical oligomers are themselves 
stacked end-to-end to create a repeating fibre. The 
adjacent copies are related by 2-fold rotational 
symmetry orthogonal to the fibre axis (Figure 3(a) 
and (b)). There are six VD9 molecules per turn, each 
rotated by 60°, to give a pitch of ~35 A or 1 VD9 
domain (Figure 2(b)). All VD9 molecules are 
natively folded and pack antiparallel down the 
helix predorninantly at 45° with respect to the fibre 
axis; the inter-subunit interactions create extended 
p-sheets across antiparallel packed monomers 
(Figure 3(c)). The centre of the fibre is hollow, 
comprising a water-filled nanotube of 20 A diameter 
(Figure 3(d)), and with an overall diameter (about 
70 A), which is similar to that of VD9 aggregate 
fibres grown in solution and visualised by electron, 
microscopy. The pentadecamer is stabilised by two 
novel interfaces, the first involving inter-digitation 
of hydrophobic side-chains and the second main- 
chain (i-zipper interactions (Figure 4). 

The first interface involves residues from and 
adjacent to the three complementarity- de terming 
region (CDR) loops. Six tyrosine residues at positions 
32, 49 and 92 on each molecule form, a herringbone 
packing interaction consisting of ir-ir or edge-to- 
plane stacking of the aromatic rings (Figure 4(a)). 
The OH groups of tyrosine residues 32 and 92 are 
orientated to allow hydrogen bonding with oppos- 
ing asparagine and glutamine side-chains (such, that 
Tyr32A interacts with. Asn34B and Tyr92A with 
Gln55B). Finally, there is a hydrophobic stacking 
interaction between Tyr92 and the planar side-chain 
of Asp49. These interface residues are remarkably 
conserved amongst Vk sequences. The three tyrosine 
residues (32, 49 and 92) are found in 67%, 91% and 



22% of Vk sequences, respectively, whilst Asn34 and 
Gln55 are present in 20% and 35%, respectively. 
Furthermore, the VD9 residue is generally the most 
prevalent, suggesting that this interface could form 
in many Vk antibodies. 

The second interface is formed by the association 
of antiparallel fJ-edge strands (residues 9-12) from 
adjacent monomers to create an inter-molecular ten- 
stranded [i-sheet (Figures 2(c) and 3(b)). The edge 
strands interact through a classic fVzipper main- 
chain hydrogen bond network, involving six close- 
paired hydrogen bonds between residues 9-12 
(SSLS) (Figure 4(b)). The resulting inter-molecular 
p-sheet has all the characteristics of a standard 
antiparallel architecture. The C a atoms of the edge 
strands align both with themselves and with the C" 
atoms of the additional intra-molecular strands. 
Furthermore, the side-chain periodicity is the same 
across the sheet such that adjacent residues are 
all above or all below the plane of the sheet. The 
resulting "pleating" creates a typically shortened C a - 
C* +2 atom distance of between 6-7 A and a regular 
"sideways" distance between adjacent C a atoms 
across the sheet of ~5 A Finally, this antiparallel 
rather than parallel packing of adjacent edge strands 
creates a strong oligomerisarion interface as it allows 
the amide groups (and thus the inter-strand hydrogen 
bonds) to be planar and energetically most favour- 
able. There are further packing interactions behind the 
zipper, including the side-chains of Vall9, Pro8 and 
Thr20 (which create a small hydrophobic core) and 
hydrogen bonds from Ser7to Thr20 and Thr22 (Figure 
4(c)). Together, these interactions create a solvent- 
excluded interface of 790 A. 

When we superposed the light-chain from the 
solved Protein L-Fv complex 4 with one of the mole- 
cules from the VD9 oligomer, we found that the four 
p-strands of Protein L and the first four fJ-strands of 
the next VkD molecule in the oligomer came into 
close alignment. Indeed., Protein L binding and VD9 
oligomerisation forms an identical hydrogen bond 
network with the same residues (Figure 4(d)). The 
distances and angles of the hydrogen bonds deviate, 
respectively, by less than 0.2 A and 6°. The only 
difference is that the VkD structure appears to have 
an additional hydrogen bond between the amide 
carbonyl and nitrogen atom of residues 12 and 9, 
respectively (in the corresponding residue, 835, in 
Protein L the nitrogen points inwards). 

To confirm the role of this interface in oligomerisa- 
tion and Protein L binding we made a series of 
proline mutants to disrupt the [i-edge. Oligomerisa- 
tion was assessed by comparing aggregation at 
4 mg/ml, and Protein L binding by capture on 
Protein L-agarose. The introduction of a single 
proline at residue 12 disrupted both oligomerisation 
and Protein L binding. Furthermore, this had a 
dramatic affect on both the solubility and expression 
of VD9, the mutant not aggregating at concentrations 
>25 mg/ml. 

Taken together our results suggest that the 
exposed [i-edge at residues 9-12 facilitates both 
aggregation and binding of super-antigen. Despite 
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Figure 4. VD9 fibre interface interactions. Opposing monomers are coloured orange or wheat. Atoms are labelled 
with chain identifier, residue number and atom type in that order. Putative hydrogen bonds are indicated with a broken 
line, (a) Tyrosine hydrophobic-stacking interface, (b) p-Zipper interface, front view, (c) fi-zipper interface, back view, (d) 
Binding interface between Protein L (orange) and Vk (wheat). 



these pathogenic effects, the design of the Vk 
strand switch is highly conserved. From an earlier 
analysis of switch variants, 7 the wild-type 
appeared to be the most energetically stable, but 
these variants also proved more prone to aggrega- 
tion during bacterial expression. This is entirely 
consistent with our findings for VD9, which is both 
aggregation-prone and thermodynarnically stable 
with a AG value of 9.9 kcal moF 1 and a T m value 
of 66.6 °C (using 5 uM VD9 in phosphate-buffered 
saline (PBS) (pH 7.4) at 25 °C and 85 °C and 
recorded at 235 run. The unfolding curves were 
assumed to be two-state and fitted as described 8 
using a AC p contribution of 12 cal per amino acid 
residue to give T m and AGjsj-u; data not shown). 
We suggest the two strand switch designs each 
have their own advantages and disadvantages. 
Thus, in the k design a short segment of (J-strand is 
exposed, leading to super-antigen interaction and 
association with light chain deposition diseases, 2 ' 3 
whereas in the K design the strands are capped but 
the domain is less stable (and associated with 
amyloidosis 1 ). In turn this suggests that light chain 
deposition diseases may be driven by native-like 
interactions between thermodynarnically stable 
domains, in contrast to light chain amyloidosis. 



which seems more likely to be driven by the 
interactions of unfolded polypeptide. 9 ' 10 

Both the fi-edge and CDR interfaces observed here 
between free K-domains should also be permitted 
between free K-light chains or between Bence-Jones 
K-light chain dimers in vivo. By contrast, inspection of 
three-dimensional models of IgG 11 indicates that due 
to steric constraints, only one or other set of 
interactions is permitted in intact IgG. Light-chain 
deposition disease is characterised by abnormal 
levels of free K-chain synthesis, either as a result of 
lymphoproliferative disorders such as myeloma, 
which account for 60% of cases, 12 or by unbalanced 
irnmunoglobulin synthesis in bone marrow cells, 13 
and biopsies of diseased glomerular membranes 
stain for K-chains but not IgG or X. 14 Whether in vivo 
deposits combine the interfaces as seen in the crystal 
is unclear. Crystalline K-chain deposits have been 
observed in glomerulonephritis: K-chain crystal 
deposition in proximal tubular cells is a cause of 
proximal tubulopathy 1 "' whilst cytoplasmic crystal- 
line K-chain inclusions have been observed in 
malignant plasma cells in Fanconi's syndrome. 16 

It is also possible that the exposed segment of the 
Vk domain confers a functional benefit, for example 
in pathogen recognition. There are still many aspects 
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of the antibody immune response which are poorly- 
understood, particularly how the primary response, 
with its diversity limited to the number of circulat- 
ing B cells, can bind with physiological affinity to an 
almost infinite diversity of pathogen, molecules. The 
observation that polyclonal antibodies are often 
more effective at neutralising antigens than indivi- 
dual constituent antibodies has led to the suggestion 
that they operate synergistically. Potentially, the Vk 
strand switch sequence could allow antibodies from 
a polyclonal response to multimerise through edge- 
edge interaction, significantly increasing affinity- 
through avidity. Calarese et al. recently showed 
that variable domains can adopt unusual quaternary 
structures. In their case, an anti HTV-1 antibody 
underwent heavy chain domain exchange in order 
to bind a repetitive carbohydrate epitope. 17 

fi-Edges have long been hypothesised to play a 
central role in protein aggregation. Indeed, it has been 
suggested that the protection of edges may well have 
been the primary driving force behind present day 
p-protein topology. 5,18 Our results show that in 
antibodies the failure to protect a short p-edge in Vk 
domains is exploited by pathogens and may aLso lead 
to light chain deposition disease. Subtle differences in 
strand protection in Vk and VX domains therefore 
appear to have major consequences for pathology. 



Supplementary Data 

Supplementary data associated with this article 
can be found, in the online version, at doi: 1.0.1016/ 
j.jmb.2006.10.093 
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Summary 

Background: Peptostreptococcus magnus protein L 
(PpL) is a multidomain, bacterial surface protein whose 
presence correlates with virulence. It consists of up to 
five homologous immunoglobulin binding domains that 
interact with the variable (VJ regions of kappa light 
chains found on two thirds of mammalian antibodies. 

Results: We refined the crystal structure of the complex 
between a human antibody Fab fragment (2A2) and a 
single PpL domain (61 residues) to 2.7 A. The asymmet- 
ric unit contains two Fab molecules sandwiching a single 
PpL domain, which contacts similar V L framework re- 
gions of two light chains via independent interfaces. The 
residues contacted on V L are remote from the hypervari- 
able loops. One PpL-V K interface agrees with previous 
biochemical data, while the second is novel. Site- 
directed mutagenesis and analytlcal-centrifugation 
studies suggest that the two PpL binding sites have 
markedly different affinities for V L . The PpL residues in 
both Interactions are well conserved among different 
Peptostreptococcus magnus strains. The Fab contact 
positions identified in the complex explain the high 
specificity of PpL for antibodies with kappa rather than 
lambda chains. 

Conclusions: The PpL-Fab complex shows the first in- 
teraction of a bacterial virulence factor with a Fab light 
chain outside the conventional combining site. Struc- 

'Correspondence: esturaecea.fr (E.A.S.) 
jb.charbonnierecea.fr (J.B.C.) 



tural comparison with two other bacterial proteins inter- 
acting with the Fab heavy chain shows that PpL, struc- 
turally homologous to streptococcal SpG domains, 
shares with the latter a similar binding mode. These two 
bacterial surface proteins interact with their respective 
immunoglobulin regions through a similar (3 zipper inter- 
action. 

Introduction 

Peptostreptococcus magnus protein L (PpL: whole pro- 
tein L. PpL domain: individual domain of protein L) is a 
cell wall-anchored protein able to interact with a large 
repertoire of mammalian immunoglobulins (Ig) [1]. 
Staphylococcus aureus protein A (SpA) [2] and strepto- 
coccal protein G (SpG) [3] also share this property, al- 
though they recognize different Ig regions. PpL is pres- 
ent at the surface of about 1 0% of Pspfosfrepfococcus 
magnus strains [4]. It is a 76-1 06 kDa protein containing 
four or five highly homologous, consecutive extracellular 
Ig binding domains (depending on the bacterial strain 
from which it is isolated [5, 6]). The structure of PpL 
domain B, (76 amino acids) has been determined by 
NMR spectroscopy [7]. The fold of this domain is similar 
to that of the SpG Ig binding domains. It consists of a 
0 sheet composed of two pairs of anti-parallel 0 strands 
and an a helix that lies on top of the sheet. 

PpL Interacts with Ig light chains [1], notably with the 
kappa light-chain variable region (VJ from humans and 
other mammals [8, 9]. When present on the bacterial 
surface, PpL has been described as a virulence factor 
of bacterial vaginosis in different clinical specimens [4]. 
It has also been shown that PpL induces histamine re- 
lease by basophils and mast cells, presumably by cross- 
linking IgE bound to Fee receptors [10, 11]. 

PpL and SpA single domains are targets for protein 
engineering due to their ability to bind the variable re- 
gions (Fv) of a large population of antibodies. This 
unique property makes them valuable tools in biotech- 
nology for the purification and recognition of recombi- 
nant single-chain (sc) Fv. Such engineered antibodies 
are increasingly used in the optimization of specificity 
and affinity by phage display and other in vitro evolution- 
ary mutagenesis techniques [1 2]. The structural charac- 
terization of PpL and SpA binding properties is useful 
for defining the spectrum of Fvs, which bind to these 
domains, based on their sequences. 

We report in this study the crystal structure of the 
human antibody Fab 2A2 complexed through its V L re- 
gion to a PpL domain. The same Fab was the subject of 
a previous crystallographic study describing its complex 
with a Sfap/iy/ococcus aureus protein A (SpA) domain, 
which binds to the antibody V H region [13]. Here, we 
compare the location of these two Ig binding domains 
on either side of the Fv region of Fab 2A2 in relation to 
the antigen-combining site. Unexpectedly, we find that 
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there are two PpL-Fab interfaces, such that one PpL 
domain has two separate regions that can interact with 
kappa light chains and is capable of binding two Fabs 
simultaneously. The two interfaces involve similar sites 
on the V t domains. One of the PpL-Fab interfaces con- 
forms to previous biochemical data, while the second 
is novel. Site-directed mutagenesis and analytical-cen- 
trifugation studies suggest that the two PpL combining 
regions have markedly different affinities for V L . The high 
specificity of PpL for kappa (the largest mammalian V L 
gene family), rather than lambda chains, is discussed in 
light of the structure of the complex. We analyze the 
sequence diversity of the PpL domains at positions in- 
volved in the interaction with V L regions and discuss 
avidity effects reported between whole PpL and Ig. Fur- 
thermore, by comparing the interactions of PpL and SpG 
with V L and C„1 , respectively, we note that at a structural 
level, the Fab binding modes of these two bacterial 
proteins show a degree of convergence. 

Results and Discussion 

The First V L -PpL Interface 

The asymmetric unit has one PpL domain and two Fab 
molecules. The PpL domain is in close contact with the 
V L region from both Fabs (Figure 1a). This stoichiometry 
was unexpected because no previous studies raised 
any evidence for one PpL domain being able to complex 
to two V L regions simultaneously [6, 14]. 

The first interface buries a total solvent-accessible 
area of 1300 A 2 with approximately equal contributions 
from both molecules, as determined with a 1 .4 A radius 
probe, and is remote from the Ig heavy chain. There is 
no conformational change in the backbone of either 
partner upon binding. The PpL domain C* used in this 
study has 94% sequence identity with the C4 domain 
[15]. Its affinity for kappa chains is comparable (130 nM) 
to the one measured for PpL domain B1 [1 4, 1 5]. Domain 
C* maintains the same three-dimensional structure as 
PpL domain B1 determined by NMR [7], and the two 
structures superpose with an rmsd of 1.31 A over 59 
residues. The complexed PpL domain C* superposes 
with an rmsd of 0.39 A over 61 residues of the crystallo- 
graphic structure of free PpL domain C* (B.J.S., unpub- 
lished data). Similarly, the variable region of the com- 
plexed Fabs superimposes with an rmsd of 0.6 A over 
215 residues with the free structure determined pre- 
viously [13]. 

The interaction with PpL involves 13 residues from 
the Fab (Figures 1b and 1c). Ten are located in frame- 
work region 1 (FR1). The others are Lys-L107 from the 
segment connecting the V L to the C L region, Glu-L143 
from the Q. region, and Arg-L24 from the CDR-L1 region 
[1 6] on strand B. However, this residue does not belong 
to the V L hypervariable loops according to the structural 
definition of Chothia [17] or to the positions frequently 
identified by the contact with antigens [18]. Compared 
with SpA, which binds to the variable region of the same 
human antibody (Fab 2A2) [1 3], PpL is farther away from 
the center of the antigen binding site (23 A compared 
to 1 6 A). Hence, like SpA, PpL binding should not affect 
accessibility to the antigen-combining site, as also sug- 
gested by competition assays [19]. 



Of the amino acids in the PpL domain, 12, located 
mainly on strand (32 and the a helix, are involved in this 
interaction (Figure 2). This interface is characterized by 
six hydrogen bonds (Figure 1b, Table 1). Three are be- 
tween main-chain atoms located on PpL strand p2 and 
on strand A from the first Fab; they thus join the p sheets 
of the Fab and the protein L into a unique sheet through 
a p zipper interaction. 

Heteronuclear NMR spectroscopy has been used for 
mapping backbone positions of the PpL domain B1 [20] 
involved in the interaction with the V L region. Most posi- 
tions identified for domain B1 are also implicated in the 
first interface of domain C* described in the present 
study. These backbone positions are on strand p2 and 
the a helix of the PpL domains. A minor discrepancy 
with the NMR results concerns the loop between the a 
helix and strand p3, which does not make any contact 
with the Fab V u region in the crystal structure of the 
complex. This loop is poorly defined in the NMR struc- 
ture, so the discrepancy can be attributed to a change 
in mobility upon complexation. 

The first interface was subjected to site-directed mu- 
tagenesis of PpL domain C*. A 23-fold drop in affinity 
results from a Y53F substitution [21] as measured by 
competitive ELISA. This decrease following the loss of 
the tyrosine hydroxyl is as expected on the basis of the 
energy of a neutral hydrogen bond [22], consistent with 
the loss of the interaction between the Tyr-53 hydroxy! 
group and the Thr-L20 carbonyl group of the V L region 
(Table 1). Thus, the first interface explains the existing 
data well. 



The Second V L -PpL Interface 

In the crystal, the single PpL domain is sandwiched 
between the V L regions of two 2A2 Fabs present in the 
asymmetric unit (Figure 1 a). The second interaction bur- 
ies a total solvent-accessible area of 1 400 A s , compara- 
ble to other protein-protein complexes [23]. The PpL 
domain C* has a different orientation in the two interac- 
tions relative to the Fab p sheet. This second interaction 
involves 1 5 V u residues, located mainly on the p strands 
A and B (as in the first interface) with some participation 
of strands D and E (Figure 1c). Out of the 15 residues 
from the V L region involved in this second interaction, 
10 are common to the first one. On the contrary, none 
of the PpL residues that contribute significantly to this 
interface are involved in the first one. The 1 4 amino acids 
from the PpL domain involved in this second interaction 
come mainly from strand p3 and from the a helix (Figure 
2). Six hydrogen bonds and two salt bridges mediate this 
interaction (Table 2), as compared to only six hydrogen 
bonds for the first interface (Table 1). Although unex- 
pected from biochemical studies, this second PpL-Fab 
interface buries an area too large to be a crystal contact. 
Given that this interface buries a surface larger than 
1400 A 2 , the probability that it is just a crystal contact 
can be evaluated as only 2% [24]. Thus, we believe that 
this interface has to be given serious attention. 

What is the strength of this second interaction? At 
present, no definite answer can be given, but we have 
used mutagenesis to probe the relative contributions of 
these two interfaces. First, we constructed the Y64W 




Flgurel. PpL domain C* Complexed wtth Human IgM Fab 2A2 

(a) Ribbon representation of the 2 Fab:1 PpL domain complex present In the asymmetric unit The PpL domain C* (red) Is 
two Faba (blue and green). Light colors represent the light chains, and dark colors represent the heavy chains. MagenU. ,„ 
loops, as defined by Chothla [17], and positions the Fab-PpL Interlaces relative to the combining site. Pseudo 2-fokJ axis 



1 2F. - F, electron density map contoured at Icr at the first V L -PpL Interface. The letter I 
Fab light chain. Green dotted lines depict the three hydrogen bonds between Fab and PpL maln-i 
bonds are shown In magenta and Involve at least one side-chain atom. 

(c) Ribbon representation of the V L region of Fab that Interacts with the PpL domain. Ten residues common to the first and 
are In yellow. Positions m pink are only involved m the finrt Interface, and those in Bght green are Implicated only In the se< 
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mutant of a PpL domain C* to have an efficient fluores- 
cent probe so as to measure Kd by the stopped-flow 
method [21]. Second, we constructed two double mu- 
tants, Y53F-Y64W [21 ] and D55A-Y64W (this study), with 
the purpose of weakening, respectively, the first and 



second Interfaces. In the Y53F-Y64W mutant, the re- 
placement of Tyr-53 by Phe disrupts the hydrogen bond 
between the tyrosine hydroxy! and the L20 carbonyl in 
the first interface. Similarly, we chose the D55A-Y64W 
mutant in order to disrupt the salt bridge between Asp- 




55 and Arg-L24 as well as the hydrogen bond with the 
Ser-L7 side chain in the second interface. These three 
single PpL domain constructs are correctly folded, as 
Indicated by far-UV CD spectroscopy (data not shown). 
While the loss of a single hydrogen bond at the first 
interface has a significant effect on the dissociation con- 
stant of the V L -PpL domain C* complex, the disruption 
of the salt bridge and the hydrogen bond mediated by 
Asp-55 of PpL at the second interface does not alter 
the Kd of the complex (Table 3). This suggests that 
the binding of single PpL domain C* is dominated by the 
first Interface and that the dissociation constant of the 
second is at least one order of magnitude larger than 
the first one. 

Evidence supporting the existence on a single PpL 
domain of two Fab-combining regions, differing in affini- 
ties by about two orders of magnitude, has been ob- 
tained recently by analytical -centrifugation studies 
(B.J.S. and R Beavil, unpublished data). 

This observation is reminiscent of the 1 :2 stoichlome- 
try described for the interaction between human growth 
hormone and its receptor [25], where two different faces 
of the hormone contact similar surfaces on two recep- 



Table 1. Hydrogen 


3onds Involved in the First V L -PpL Interface 




PpL Domain C* Distance (A) 


Ser L-9 N 
Ser L-9 Oy 
Ser L-10 O7 
Ser L-10 0 
SerL-12N 
Thr L-20 0 


0 Glu 38 2.8 
N 40 Lys 3.1 
Oe1 Glu 38 2.9 
N Glu 38 3.1 
OThr36 2.7 
On Tyr S3 2.6 


Hydrogen bond d 


stances were calculated with the program 


according to the erf! 


wla of McDonald and Thornton [45]. 



tors, with the result being receptor activation [26]. By 
analogy, in our study a single PpL domain contacts 
similar surfaces on two V L regions and could bridge two 
Igs anchored at the membrane of B cells. The biological 
relevance of this second interface and the potential mi- 
togenic activity of a single PpL domain on B cells are 
under investigation. 



V L Specificity of the PpL Binding Interaction 
PpL has been reported to bind efficiently to V L regions 
of about two thirds of the human Ig repertoire and to 
thus encompass mainly the k1, k3, and k4 subgroups 
[8] but neither k2 nor any \ subgroups. The PpL kappa 
specificity observed for human antibodies extends to 
various other mammalian species [9]. To analyze the 
structural determinants of this specificity, we have su- 
perimposed on the 2A2 V L region the equivalent struc- 
tures found on different kappa and lambda subgroups. 
We observed that PpL binding ability is mostly concen- 



Table 2. Hydrogen Bonds and Salt Bridges Involved In the 
Second V L -PpL Interrace 



Vi Region PpL Domain C* Distance (A) 



SerL-7C-Y OMAspSS 2.9 

Ser L-10 O7 0-,1 Thr6S 3.2 

SarL-12N O Ala 88 2.9 

SerL-120 N Leu 68 2.6 

Arg L-18 N OGiy7l 3.1 

Arg L-24 F*|1 OS1 Asp 55 2.7 

Lys L-107 N{ 061 Asp 87 2.6 

Lys L-107 NC O Leu 68 2.9 



Hydrogen bond and salt bridge distances were calculated with the 
program CONTACTS [44] with a maximum distance cutoff of 3.4 A 



and defined according to the criteria of McDonald and Thornton 
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Table 3. Dissociation Constants between PpL Single Domain C* 
and Human k Chain as Determined by Stopped-Flow 



PpL Construct Kd ftiM) Comment 

Wild type 0.13' * 

YB4W 0.17= Fluorescent probe 

Y53F-Y64W 2.8' Hydrogen bond on interface 1 disrupted 

D55A-Y64W 0.145 Hydrogen bond and salt bridge 

on interface 2 disrupted 

'From Beckingham et al [14]. 

'From Becklngham et el pi], 



trated on the L5 to LI 2 segment of the V L region and 
that the interaction is dependent on the main-chain con- 
formation of this segment, on which conformation the 
p zipper interaction also depends. Particularly important 
are residues L8-L1 2, which bury more than 80% of their 
total accessible area. The V L residues of this segment 
and at other positions of the first interface are well con- 
served among the recognized V K regions (Table 4). The 
main chain superimposes well onto V L structures of 
the k1 , k3, and k4 subgroups (rmsd of 0.3-0.5 A over the 
eight residues from L5 to L12). Most k2 subgroup se- 
quences have a proline residue at position L1 2 that intro- 
duces major steric hindrance by pointing the Pro ring 
toward the PpL domain. For the X subgroup, two obser- 
vations account well for the absent or weak binding 
activity found for these chains. Firstly, the L5-L1 2 seg- 
ment is one residue shorter, resulting in backbone con- 
formational differences, which may alter the 3 zipper 
interaction (rmsd of 1-1.2 A over the seven residues 
from L5 to L12). Secondly, due to the reduced size of 
this segment, a cavity is created around positions L7 
and L8. The Fab recognition mode developed by the 
PpL domain is highly dependent on the backbone con- 
formation, different from that of SpA, which relies more 
on a side chain-specific recognition mode [13]. 

Avidity of Whole PpL for Immunoglobulins 
PpL contains four or five highly homologous, consecu- 
tive extracellular Ig binding domains in tandem, de- 
pending on the strain. At what level are the positions 
involved In the first interface conserved between the 
different domains? Sequence alignment of the four C 



domains and five B domains with the C* domain (Figure 
2) shows that domain C1, which differs at 20 positions 
out of 61 , is the most distant from domain C*. The differ- 
ences on C1 are distributed all over the domain, with 
higher incidence on the a helix and strands 02, (33, and 
B4. The other C domains have a higher degree of se- 
quence identity with C*. The B domains have from 10 
to 17 well-distributed changes compared to C*. 

Structurally, a core of seven critical residues can be 
identified. These residues are those largely buried upon 
complex formation or those involved in hydrogen bonds: 
Gln-35 to Lys-40 from strand |32 and Tyr-53 from helix 
a. These seven residues are strictly conserved in eight 
out of the ten PpL domains (Figure 2). Domains C1 and 
B5 each have only one nondisruptive difference. The 
changes from Thr-36 to Asn in C1 and from Glu-38 to 
Thr in B5 could weaken the interaction but not disrupt 
it. Residue changes outside the structural core in do- 
mains C2 and C3 could result in a slightly lower affinity 
for k chains compared to that of domain C*. Residues 
49 and 52 from domain C* are, respectively, Glu and 
Arg, which make an internal salt bridge in the middle of 
the a helix. In domains C2 and C3, these residues are 
replaced, respectively, by Lys and Ala. This disrupts the 
salt bridge and places the Lys in close proximity to two 
Arg residues, and this further results in an unfavorable 
accumulation of positive charges. 

As is the case for the five SpA domains with respect 
to their interaction with V„ region [1 3], most PpL domains 
should also conserve all their hydrogen bonding interac- 
tions. Thus, we would expect that all the PpL domains 
would bind to the V L region in a similar way so that a 
whole PpL would contain at least four Ig binding sites. 
The equivalence in binding between the domains Is con- 
sistent with previous studies on the binding properties 
of different PpL constructs. These studies show that 
avidity increases with the number of Ig binding domains 
[6]. In fact, a four-domain PpL has an avidity 100-fold 
higher than the single domain, and a fifth Ig binding 
domain does not improve avidity further [6]. By superim- 
posing a PpL-Fab complex on each V L region of whole 
IgG [27, 28], we can show that the distance between 
the C terminus of a PpL domain bound to one Fab and 
the N terminus of the other bound PpL ranges from 
60-120 A depending on the Fab hinge angle. Four do- 
mains (including their interdomain linkers of 16 amino 




in 15% of fui 
d with PpL to aid the structural all 
a. The PpL binding properties are taken from Nilson et af. [81. 




acids each) could span this distance, and this could 
explain the strong avidity effect observed on Ig binding. 
The same pattern of modular domains binding to Igs is 
shared by SpA and SpG, which like PpL seem to have 
evolved high avidity for Ig by cassette duplication to 
allow multiple attachments [29]. 

Although the Kd of the second interface may be sub- 
stantially higher than the first, we cannot exclude the 
possibility that the second interface could be structur- 
ally Important for PpL avidity. A PpL molecule bound to 
Ig through the high-affinity site of one domain could use 
the low-affinity site from another domain to interact with 
the Ig molecules when this would be stericalty more 
' g via the first Interface. 



Modes of SpG and PpL 

Streptococci and Peptostreptococcl are found in the 
same habitats, including the human intestinal and geni- 
tal tracts. Their cell surface proteins share many com- 



of gene fragments may have occurred between these 
two Infective bacteria [29], With only 15% sequence 
identity, the Ig binding domains are the least homolo- 
gous regions between these cell surface proteins. De- 
spite the low sequence identity, these domains share a 
common fold, with an a helix packed against a four- 
stranded p sheet The NMR and crystal structures of 
both domains have shown that the main structural differ- 
ence is the orientation of the a helices [7, 30]. The helix 
runs almost parallel to the p strand direction in the PpL 
p sheet, whereas in SpG it runs diagonally across the 
sheet This arises from a difference in the loop between 
the a helix and the strand p3, which is one residue 
shorter In PpL. Since this is the region of SpG that binds 
to Fc, it may in part explain the absence of Fc binding 
by PpL 

The interaction of SpG with C1 and that of the first 
interface of PpL with V L regions of Fab (Figure 3) share 
similar features. Both domains bury equivalent surface 
areas upon binding to the Fab and have two common 
structural features. Firstly, the same region of these two 
domains is buried In the interactions since SpG also 
interacts through its strand p2 [31]. Secondly, in both 
cases this strand extends a p sheet of the Fab through 
a p zipper interaction. This Interaction involves five main- 



chain/main-chain hydrogen bonds In the SpG-C„1 inter- 
face [30] but only three hydrogen bonds In the PpL-V L 
first interface. The external strand A of the V L region 
involved in the Interaction presents a bulge due to pro- 
line at position L8 in the middle of the strand, and this 
bulge shortens the p zipper. Although the SpG and PpL 
domains are clearly the result of divergent evolution [29], 
they have maintained a common binding strategy for 
interacting with the p strands from different Ig domains 
(Figure 3) through a maln-chaln-to-main-chaln hydrogen 
bonding network. 
Uke SpA, PpL domains target a wide repertoire of 



mains interferes with antibody-antigen recognition, they 
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may be used to aid the crystallization of antigen-Fab 
[32, 33] or membrane protein-Fv complexes [34]. We 
suggest that these small protein domains should be 
used in a combinatorial manner to increase the variety 
of possible crystal contacts that can be formed. This 
would help to solve some problematic crystallization 
cases. Knowledge of the structure of the PpL-Fab com- 
plex will facilitate the engineering of PpL variants with 
extended binding capacity, for example to encompass 
\ chains and improve their utility as reagents for Ig de- 
tection and purification. 



Several pathogenic bacteria present cell surface pro- 
teins with immunoglobulin binding properties that con- 
fer on them an advantage during invasion and coloniza- 
tion of host tissues. Some of these bacterial proteins 
interact with a wide proportion of the immunoglobulin 
repertoire through interactions with constant regions of 
immunoglobulin that are highly conserved In sequence. 
Two bacterial proteins. Staphylococcus aureus protein 
A (SpA) and Peptostreptococcus magnus protein L 
(PpL), interact with a wide range of antibodies by tar- 
geting variable, rather than constant, regions of Fab. 

We recently reported the crystal structure of an SpA 
domain in contact with the variable-heavy (V„j region of 
an Fab [13] and we now report the crystal structure of 
a PpL domain in complex with the variable-light (VJ 
region of the same Fab. Both bacterial domains Interact 
with the framework part of these variable regions without 
contacting the hypervariable loops. The positions identi- 
fied in both complexes account well for the wide reper- 
toire of immunoglobulins recognized by both bacterial 
domains. In contrast with SpA, which targets conserved 
residues of the V„ external p sheet, PpLtargetsaportion 
of the V L region that is not strictly conserved in sequence 
but that maintains a constant backbone conformation 
among most V u regions encoded by k genes. 

A single domain of this multidomain protein manifests 
PpL activity. The PpL domain appears to have two inde- 
pendent binding sites for V L , which interact with very 
similar areas on the light chain but with markedly differ- 
ent affinities. The residues involved in the V L interaction 
are well conserved among the different PpL domains. 
The interaction of the single domain accounts well for 
two of the properties reported for the whole protein L 
Firstly, PpL exhibits an avidity effect for whole immuno- 
globulin, and the overall position of a single bacterial 
domain suggests that a whole PpL with up to five do- 
mains in tandem should be able to bind the two V L 
regions of an IgG. Secondly, PpL could induce histamine 
release by bridging two IgE bound to their Fee receptors 
at the surface of human basophils or mast cells. The 
conservation of PpL positions in all domains is also in 
agreement with this last property. 



w. " OID yiumt uy vafw, uiiiuQiun ai ruum lemperaiure in 
sitting drops by the mixture of a reservoir solution (1 3%-1 6% [wt/wt] 
monomethyl polyethylene glycol (MPEG) 5000, 100 mM imidazole 
malate [pH 8.5]) with an equal volume of protein solution (5 mg/ 
ml) containing different Fab 2A2:PpL molar ratios. The ratio was 
optimized with 

the growth of 




taining 50% reservoir solution and 50% cryo-solution (25%(wt/wt) 
MPEG 5,000, 25%(vA,) ethylene glycol. 9%(v/v) xylitol, 100 mM 
HEP . E . S J_ pH 7 S> : Data w8rB "worded at 120°K from a single crystal 



- r »- The crystal belongs to the orthorhombic 

ce group P2,2,2, with a = 55.2 A, b - 87.3 A and c = 210.5 A. 
■ sd in Table 5. 



structure with the program AMoRe [39] by using coordinates of Fab 
2A2 (PDB code: 1DEE) [13]. The variable (V L -V„) and the constant 
(C-C„1) regions were used separately as search models for the 
le Fab 2A2-PpL complex (Table 5). 



single PpL 



F„ - F„ electron density maps w m = ^.„„«, „,„, „ 
[40]. The difference maps show electron density for 
domain sandwiched between two Fab fragments. Both the NMR 
structure of the B1 domain [7] and the crystal structure of the C* 
domain of PpL (B.J.S, unpublished) were fitted globally in the elec- 
tron density maps with the best fit for the crystal structure. The 2 
Fab:1 PpL model was refined within the 20-2.7 A resolution range 
m CNS [41] and was rebuilt in O [40]. Detectable 
ge from Glu-20 to Ala-80 (see Figure 2 
d by NMR [7]. the 19 amino 

and thus absent from the final 

». Small loops from the C„1 region (H1 36 to HI 43 apH mm 
to H204) of both Fab mo' ' 
removed from the final st 

both Fabs, the PpL domain, and in particular the V L -PpL~dc 




Y64W mutant was made by the mutation of residue 55 on the Y64W 
template DNA by the use of the antisense primer 5' TGC TAA TAA 
AGC TGC ATA TCT 3\ The site of the m 
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Partially Folded Intermediates as Critical Precursors of Light Chain Amyloid Fibrils 
and Amorphous Aggregates 1 ' 
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abstract: Light chain, or AL, amyloidosis is a pathological condition arising from systemic extracellular 
deposition of monoclonal immunoglobulin light chain variabl i rm of insoluble amyloid 

fibrils, especially in the kidneys Subst u i I nu i md fibril formation from native 

proteins occurs via a conformational change leading to a partially folded intermediate conformation, whose 
subsequent association is a key siep in fibrillation. In the present investigation, we have examined the 
properties of a recombinant amyloklogenic light chain variable domain. SMA, to determine whether partially 
folded intermediates can be detected and con-elated with aggregation. The results from spectroscopic and 
hydrodynamic measurements, including far- and near-UV circular dichroism, FTIR, NMR, and intrinsic 
tryptophan fluorescence and small-angle X-ray scattering, reveal the build-up of two partially folded 
intermediate conformational states as the pH is decreased (low pH destabilized the protein and accelerated 
the kinetics of aggregation). A relatively nativelike intermediate. I N , was observed between pH 4 and 6, 
with little loss of secondary structure, but with significant tertiary structure changes and enhanced ANS 
binding, indicating exposed hydrophobic surfaces. At pH below 3, we observed a relatively unfolded, but 
compact, intermediate, lu, which was characterized by decreased tertiary and secondary structure. The In 
intermediate readily forms amyloid fibrils, whereas In preferentially leads to amorphous aggregates. Except 
at pH 2, where negligible amorphous aggregate is formed, the amorphous aggregates formed significantly 
more rapidly than the fibrils. This is the first indication that different partially folded intermediates may 
be responsible for different aggregation pathways (amorphous and fibrillar). The data support the hypothesis 
that amyloid fibril formation involves the ordered self-assembly of partially folded species that are critical 
soluble precursors of fibrils. 



Immunoglobulin (Ig) 1 light chains are involved in several 
protein deposition diseases, including one resulting in the 
formation and deposition of amyloid fibrils (light chain or 
AL amyloidosis) and another known as light chain deposition 
disease that involves amorphous protein deposits (/, 2). The 
morphology of the deposited aggregates in these two diseases 
is clearly different, and typically patients exhibit only one 



I his res i , i i i il i it 

, L 111 f> ! i i . . A 1 il _ 

^ C II I l 11 '1 I ,' I 

were eollei i i 'on Radiation 

Laboratory (SSRL). SSRL is supported by the U.S. Department of 
Energy, Office of Basic Energy Sciences, and in part by the National 
li s> k of Health. National u lor Research Resources. Biomcdi- 

i i icportmcitt of Energy. Office 

Biological and Environmental Research. 

* Correspondence should be addressed to this author at the Depart- 
ment of Chem, ' i ihtoinia. Santa 
Uu/ t \->soe4 i,fn. t x i isy-2935, E-mail: 

-anient add s unit i i i lr\ t ti d I mversity 
Stanford. CA 91305. 

1 Abbreviations: SAXS. small-angle X-ray i i «>f 

, i i I i i i 1 i - 

sion electron microscopy; AFM. atomic force microscopy; 'ITT. 

ITiioflavin T; ANS. 8-an ■ l-naphtbaici te CD, "circular 

m A, \an i ii 



form of light chain deposition. However, there is at least 
one report of a patient exhibiting both AL amyloidosis and 
LCDD involving the same light chain (3). The exact length 
of light chains in amyloid deposits varies, but is usually in 
the 1 10—130 residue range (12—14 kDa) corresponding to 
the intact variable domain (-/). 

The molecular meel jni.si ig to an 1 -i J formation 

are poorly understood. In this report, we address the question 
of why some immunoglobulin light chains form amyloid and 
related deposits while others do not, in particular the 
hypothesis that protein aggregation arises from the self- 
association of partially folded intermediates. Support for this 
hypothesis has been found with proteins such as transthyretin 
(J) and lysozyme (6). We postulate that amyloid fibril 
formation from native proteins occurs via a conformational 
change leading to formation of a partially folded intermediate 
conformation, association of this intermediate to form soluble 
oligomers leading to the critical nucleus, and subsequent 
formation of the initial fibrillar species, typically a filament 
or protofibril, and finally association of protofibrils into 

We have investigated the biophysical properties and 
amyloidogenicitv of the variable domain of a recombinant 
amyloidogenic light chain, SMA, engineered by Stevens et 
al. (8). SMA (114 residues, hi = 12 700) was initially 
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1 from lymph node-derived amyloid fibrils of an AL 
amyloidosis patient (9). A very similar light chain domain. 
LEN, was derived from a patient with multiple myeloma 
who showed no evidence of renal dysfunction or amyloidosis 
(10). The SgG-V L domains consist of a highly conserved 
framework formed by two sheets of antiparallel /S-strands 
forming a /8-sandwich, and three loops comprising the 
complementarity-determining regions (CDR) that form part 
of the antigen binding site. The it f SMA and LEN 
are very similar, differing only at 8 positions out of 1 14. 
Four of these are in CDR3 (Q89H, T94H, Y96Q, S97T), 
two are in CDR1 (S29N, K30R), and the remaining two are 
in the framework region (P40L, I106L). The high-resolution 
crystallographic structure of LEN (1.8 A) has been solved 
(PDB Accession No. 1LVE) (11). Both SMA and LEN 
belong to the kTV family of Igs. 

The amyloidogenic light chain, SMA. is significantly less 
thermodynamically stable than LEN under all conditions 
(unpublished observations). The presence of low concentra- 
tions of denatnrants also results in fibril formation from the 
"benign" LEN (12). In the present study, we used biophysical 
characterization of the conformation of SMA as a function 
of pH to reveal the presence of two distinct partially folded 
intermediates: one with relatively nativelike properties, the 
other with relatively unfolded properties. Destabilizing 
conditions at physiological pH, e.g.. low urea concentration, 
also lead to aggregation and fibril formation. Thus, conditions 
that result in population of these intermediates lead to 
aggregation, supporting the hypothesis that partially folded 
intermediates are key precursor 

Interestingly, both amorphous and fibrillai ij - re ites were 
observed, and were shown to arise from two different 
intermediates. 

MATERI ALS AND METHODS 

Expression and Purification of Recombinant V L SMA. The 
recombinant V L domain SMA was purified from JM83 £. 
coli cells transformed with the plasmid pklVsma004, gener- 
ously provided by Dr. Fred Stevens, Argonne National Lab 
(8). The plasmid construct was based on the pASK vectors 
constructed by Skerra et al. (13) and contained an ompA 
leader for periplasmic localization of the protein to ensure 
the formation of the core disulfide bond. The overexpressed 
protein was purified using the procedure of Stevens et al. 
(8) with minor modifications. Briefly, the recombinant 
protein was extracted from the periplasm using osmotic shock 
via treatment with ice-cold TES followed by distilled water. 
The periplasmic extract was dialyzed against 4 changes of 
20 volumes of 10 mM acetate buffer, pH 5.6, and loaded 
onto a fast-flow SP Sepharose column. The column was 
washed with 10 m.M acetate buffer, pH 5.6, and the protein 
eluted using 10 mM phosphate buffer, pH 8.0. The fractions 
were assayed by SDS— PAGE, and fractions containing the 
recombinant protein were pooled, filtered through 0.22 fim 
filters, and stored in glass vials. Typical yields were 7~8 
mg of purified protein per liter of cells. Protein concentrations 
were measured via optical density at 280 rati using the 
extinction coefficient of E 0 vv, = 1.8 calculated from the 
sequence. The purified protein was stored in 10 mM 
phosphate buffer (pH 8.0) at 4 °C and used within 2 weeks 
of the initial purification. The purity of the protein prepara- 



tions was assayed by SDS-PAGE and by electrospray mass 
spectrometry (Micromass Quattro II). 

Intrinsic Tryptophan Fluorescence Measurements. Fluo- 
rescence measurements were made with a FluoroMax-2 
fluorescence spectrometer (Jobin Yvon-Spex). Emission 
spectra between 300 and 420 nm were collected with 
excitation at 280 nm. Spectra were collected at different pHs 
within the range of 10—2 using 0.5 protein samples in 
50 mM of the appropriate buffer containing 100 mMNaCl. 
Spectra were collected at 25 and 37 °C. The stability of SMA 
toward urea denaturation was monitored as a function of pH 
by recording changes in tryptophan fluorescence intensity 
upon excitation at 280 nm and emission at 350 nm at 25 °C. 

Samples of SMA (1 fiM monomer) were incubated in 20 
mM phosphate buffer (pH 7.4), 100 mM NaC'l containing 
varying amounts of urea (0—8 M) for 2 h to ensure 
completion of the unfolding reaction. Data were analyzed 
by nonlinear least-squares fitting to a two-state folding model. 
The fraction unfolded, F„, was calculated using the equa- 
tion: F„ = (y f — y)l(yf — j' u ) where y represents the observed 
fluorescence at a particular concentration of urea, and„Vf and 
v., represent the corresponding fluorescence of the folded and 
unfolded states, respectively, at that urea concentration. For 
baseline fitting, a linear least-squares analysis was performed 
to determine the values of jy and y„ as a function of urea 
concentration. The free energies of unfolding were calculated 
as a function of urea concentration using the equation: AG 
= -RT in A", q , where K a} = fj(\ -./„). AG' 0 was determined 
1 ii i i i o ui mcentration using 

expression: AG 0 = AG + wfurea]. 

ANS Binding. 1 ,8-Anilinonaphthalenesulfonate was ob- 
tained from Kodak, and a stock solution was prepared by 
dissolving in water followed by filtration through a 0.2 /urn 
syringe filter and measuring the concentration using an 
extinction coefficient of 5000 M~' cm -1 at 350 nm. The 
fluorescence emission spectra of solutions of 10 ANS 
and 0.5 ,«M protein were collected between 420 and 600 
nm upon excitation at 380 nm as a function of pH at 37 °C. 

Circular Dichroism Spectra. CD spectra were collected 
on an AVIV model 62DS spectrometer between 260 and 190 
nm for the far-UV region and between 320 and 250 nm for 
the near-UV region, with a step size of 0.5 nm and an 
averaging time of 5 s and collecting 5 repeat scans. Cells of 
1 and 0.01 cm path length were used for near- and far-UV 
CD measurements with protein concentrations of 0.5 and 1.7 
rng/mL, respectively, 

pH Dependence. pH-dependent changes in spectroscopic 
data were fit using a modified Henderson— Hasselbalch 
equation for one (eq 1 ) or two (eq 2) transitions, to determine 
the midpoints of the transitions: 

K N + F, lO pH_pHm2 

r ° bS= 1 + 10PH-P"- 

r obs = '- (2) 

i + iop"-""-^ — - — 

|0pH-pH m2 

where f 0 b S is the observed spectroscopic property. > N is the 
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value of the spectroscopic property for the native state. Y\ H 
is the spectroscopic property for the nativelike intermediate, 
and Y[ a is the spectroscopic property for the unfolded-like 
intermediate. pH ml and pH m ; are the midpoints of transitions 
from the native state to the I N intermediate and from I N to 
the Iu intermediate, respectively (36). 

Thin film ATR-FT1R measurements were performed using 
a SPEC AC out-of-compartment ATR accessor)' and a Nicolet 
800SX FTIR bench. A germanium crystal IRE was used for 
making hydrated thin films of ~50— 100 ug of protein from 
both soluble and insoluble protein as previously described 
(14, 15). ATR-FT1R spectra were collected followed by- 
Fourier transformation of the sample spectra using a clean 
crystal spectrum as a background. The water vapor spectrum 
was collected by reducing the air purge and subtracted from 
the protein spectrum until the spectra were featureless in the 
region between 1700 and 1800 cm" 1 . ATR-FTIR spectra, for 
SMA were collected at pi I 7.5 in 50 mM sodium phosphate, 
100 mM NaCl. at pH 5.0 in 50 mM sodium acetate, 100 
mM NaCl, and at pH 2.0 in 20 mM HCI, 100 mM NaCl. 
Buffer spectra were subtracted from the sample spectra. 
Component spectra were obtained by first determining peak 
positions using both second-derivative and Fourier-self- 
deconvoluted spectra, followed by curve-fitting to the raw 
spectrum ( if>). 

In Vitro Fi - nation was 

monitored ns i i '■a'-ed on the enhanced 

fluorescence of the dye Thioflavin T (TFT) on binding to 
amyloid fibrils (17). Amyloid fibrils were grown from 
purified protein (40 pM) in 50 mM buffer and 100 mM NaCl. 
A filtered protein sample (using 0.22 fan syringe filters) was 
incubated under the desired conditions in a 1.8 mL flat- 
bottomed screw-capped glass vial with moderate stirring 
using a Teflon-coated micro-stir bar. t ypical fibril growth 
experiments involved incubating the protein at 37 °C with 
constant stirring and removing aliquots (10 fiL) over time 
for analysis by light scattering and TEM and TFT binding 
(see below). Standard buffers included 20 mM HCI or 
phosphate (pH 2), 50 mM formate (pH 3 and 4), 50 mM 
cacodylate or acetate (pH 5 and 6). and 20 mM TR1S or 20 
mM HEPES or 50 mM phosphate (pH 7). Both Rayleigh 
light scattering and fluorescence spectra were collected using 
a SPEX/Jobin-Yvon Fluoromax-2 spectrofluorometer. Con- 
stant temperatures were maintained using a circulating water 
bath. At each time point, a 220 ,«L sample was removed 
and transferred to a cylindrical quartz microcell to measure 
Rayleigh light scattering at 330 nm with a 1 nm band-pass 
for both excitation and emission monochromators. Thioflavin 
T binding assays were conducted by adding sample aliquots 
(10 uh) to 990 fiL of 20 uM TFT in 50 mM TRIS, pH 7.5. 
and 100 mM NaCl in a 1 mL fluorescence cuvette. 
Fluorescence emission was monitored with excitation at 450 
nm using a 5 nm band-pass on both the excitation and 
emission monochromators. Fluorescence intensities were 
reported at 482 nm. 

Ml\ 'V . ' - i 1 i ering mea- 

surements were performed on beam line 4-2 at the Stanford 
Synchrotron Radiation Laboratory (SSRL) as described 
previously < e S 5 instrument was configured with 

a Mo:CB 4 multilayer monochromator, an 18 mm beamstop, 
and a 218 cm sample-to-detector distance. Data were 
collected on protein samples ranging from 0.5 to 10 mg/mL 
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in 50 mM buffer and 100 m.Vi NaCl at 37 °C using a P'T'FE 
flow-cell with 1.3 mm path length to minimize radiation 
damage. Radii of gyration were calculated using the Guinier 
approximation (19). 

Atomic Force and Transmission Electron Microscopy. 
Transmission electron micrographs were collected using a 
JEOL JEM-100B microscope operating with an accelerator 
voltage of 80 kV. Typical nominal magnifications ranged 
from 27000 x to 67000 x. Samples were deposited on 
Formvar-coated 300 mesh copper grids and negatively 
stained with freshly prepared 2% aqueous uranyl acetate. 

For AFM, aliquots of 50 uh of incubation solution were 
transferred to an Eppendorf tube and spun to pellet the 
precipitated material, which was then washed twice with 
water before resuspending in deionized water. A drop of 
aggregate suspension was deposited on freshly cleaved mica 
and dried immediately with nitrogen gas. The samples were 
imaged with an Autoprobe CP AFM (Park Scientific, 
Sunnyvale, CA) in the noncontact (NC-AFM) mode. The 
tube scanner was a 100 ,«m Scanmaster (Park Scientific). 
NC Ultralevers (Park Scientific) were used as cantilevers. 
The resonant frequency was ~100 kHz. The images were 
taken in air. ambient conditions, at a scan frequency of 1-2 
Hz, using silicon nitride tips. 

SSIR Spectroscopy. >H NMR spectra were collected using 
a Varian Unity-r 500 spectrometer equipped with ultrashims 
and a Varian triple resonance probe. Presaturation and 
postacquisition digital filtering were used for solvent sup- 
pression. Data were collected on 0.5 mM protein samples 
containing 100 mM NaCl and 10% D : 0. The pH was 
adjusted by titration with NaOH or HCI as needed. All data 
were recorded at 37 °C. 

i ' u In '! i i ( n filtra- 
tion. The rate of aggregation was monitored by static light 
scattering using a Nepheloskan instrument from Labsystems 
and a 96-well plate reader. Solutions of 3.5 and 7 mg/mL 
SMA at the appropriate pH were incubated at 37 °C along 
with their corresponding buffers, and scattering was measured 
every 15 min for 6 h. 

pH Jump Experiments. Interconversions between N, I N . 
lu, and U were monitored by diluting 10 uM SMA at one 
pH into buffer of another pH. such that the final concentration 
of protein was I fiM. After manually mixing, the intensity 
of either tryptophan fluorescence (excitation 280 nm and 
emission 345 nm) or ANS fluorescence (excitation 380 nm 
and emission 470 nm) was monitored, using a time-based 
scan on a Spex Fluoromax instrument with 1 s time- 
averaging. 

RESULTS 

Formation of Amyloid Fibrils Is Favored by Destabilizing 
Environmental Conditions. The ei i I flu ;scence emis- 
sion of the dye Thioflavin T on association with amyloid 
fibrils provides a very convenient method to monitor the 
kinetics of amyloid fibril formation (17, 20). Fibril formation 
by SMA was investigated by stirring solutions of SMA at 
various values of pH at: 37 °C. At room temperature, or in 
the absence of stirring, no enhancement of TFT was observed 
for several days. 

The rate of fibril formation from SMA was found to be 
very sensitive to a number of extrinsic factors, including pH. 
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agitation, and temperature. As is t) 1 e i 

fibril growth euives. SMA fibrillation kiretu 
sigmoidal behavior, consisting of a lag phase followed by a 
logarithmic growth phase that eventually plateaus {Figure 
1). The slight drop in TFT fluorescence sometimes observed 
at long time periods may reflect conversion of the mature 
SMA amyloid fibrils to an alternative fibrillar form that has 
a lower affinity toward TFT. For example, we frequently 
observed lateral aggregation of mature fibrils coinciding with 
the decrease in TFT fluorescence, suggesting a possible 
decrease in the availability of TFT binding sites. 

We confirmed that the enhanced fluorescence emission 
of TFT was indeed due to interaction with SMA amyloid 
fibrils by the characteristic Congo red green-birefringence 
observed under crossed-polarization (data not shown) and 
by direct observation with transmission electron microscopy 
and atomic force microscopy. These techniques demonstrated 
the presence of fibrils in systems with high TFT fluorescence, 
and their absence in samples with no increase in TFT 
fluorescence. She 11-T assay was shown to be linear over 
the concentration range of 0 to > 12 pi% of amyloidogenic 
protein in the assay solution. 

The initial lag in fibril formation (see Figure 1) is often 
attributed to the slow assembly of a critical nucleus in a 
nucleation— polymerization mechanism (21). The length of 
the lag (measured by extrapolating the exponential growth 
phase to zero intensity) during SMA fibril formation was a 
sensitive function of the pH at which the fibrils were 
generated. Some typical data are shown in Figure 1. The 
length of the lag decreased from days at pH 7.0, 37 °C ,for 
40 fiM SMA, to a few hours at pH 2 (these values for the 
lag time are very sensitive to the rate of stirring or agitation). 
In addition, the maximum signal obtained using the TFT 
assay increased with decreasing pH, indicating that more 
fibrils were formed at lower pH values (note that the TFT 
binding assay is performed at pH 7). 

Static light scattering was also used to monitor the kinetics 
of aggregation (Figure I). Surprisingly, we noted that 
amorphous aggregation occurred in the same incubation 
samples of SMA as fibril formation, but at a faster rate. 
Substantial amorphous aggregation was observed from pH 
7 to 4. The amount of amorphous aggregation was propor- 
tional to the protein concentration. The amorphous aggrega- 
tion was observed immediately after starting the stirring at 
37 °C as indicated by increased light scattering (Figure 
1 B,C), whereas the presence of fibrils, as reflected by the 
increase in TFT fluorescence, was not observed for several 
days. Confirmation of the fact that this early aggregation 
was indeed amorphous comes from TEM and AFM micro- 
graphs that showed amorphous material and the absence of 
fibrils (Figure 2). Under certain conditions, e.g., pH 5.0, 37 
°C, 100 inM NaCl, 50-60% of the SMA had precipitated 
(as amorphous aggregate) after 24 h of incubation (based 
on the absorbance of the supernatant), and no fibrils were 
visible by microscopy. In contrast, at pll <3 essentially all 
of the precipitate was fibrillar within 24 h. The rate of the 
increase in the light scattering observed at pH 2 correlated 
with that of the increase in Thioflavin T fluorescence (Figure 
1A), suggesting that the predominant species present in 
solution was fibrillar. This was confirmed by TEM, which 
indicated that the aggregates were largely fibrillar, though 
some amorphous material was present. Interestingly the 
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Figure 1: pH dependence of amyloid fibril formation by recom- 
binant V L domain SMA. Fibril formation was monitored using 
Thioflavin T emission at 482 nm upon excitation at 450 nm at pH 
2 (A, A), 5 (B, □), and 7 (C, O). Rayleigh light scattering was also 
monitored at pH 2 (A, A), pH 5 (B, ■), and pH 7 (C, •). The 
formation of amorphous aggregates precedes the formation of 
amyloid fibrils at pH 7 and 5, conditions that favor the native or 
the nativelike intermediate conformation. The inset to panel A 
shows an expanded time scale. Light scattering is sensitive to the 
presence of both amorphous and fibrillar material, whereas TFT 
fluorescence is selective for fibrillar aggregates alone. Panel D 
shows that in the absence of agitation, at SMA concentrations as 
high as 0.5 mM, no aggregation occurs over at least a 6 h period: 
circles are for pH 7, triangles for pH 2. With agitation, the signal 
would be >1400 due to the aggregation. 

maximum increase in Thioflavin T fluorescence was sig- 
nificantly less at pH 7 compared to pH 5 and pH 2, which 
also correlated with fewer fibrils observed by microscopy 
at pH 7 compared to the lower pH conditions. 
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nebulous and loosely packed amorphous deposit is observed (B, 
bottom). After a few days at pH 5 and 7, both fibrils and amorphous 
deposits are observed. 



Morphology of Fibrillar and Amorphous Deposits. The 
SMA aggregates were examined using atomic force (Figure 
2) and transmission electron microscopy. The images re- 
vealed unbranched, rope-like fibrils (Figure 2A), several 
hundred nanometers in length, most with diameters of ~8 
nm, but some, protofibrils, with diameters of ~4 nm (7). 
Upon incubation at 37 °C at pH 4—6, amorphous aggregates 
were observed (Figure 2B). 

Spectroscopic Characterization of Acidic pH Conforma- 
tions of SMA. A number of spectroscopic probes were utilized 
to examine conformational changes in SMA that occur under 
conditions favoring the formation of amorphous aggregates 
(pH 4—6) and amyloid fibrils (pH <3). Both amyloid fibrils 
and amorphous aggregates are only formed in solutions of 
SMA at 37 °C upon agitation of the solution. Note that all 
the spectroscopic analyses were done at 37 °C without 
agitation within 2—3 h of preparation, ensuring that only 
soluble equilibrium conformations were studied; i.e., neither 
amorphous nor fibrillar aggregates were present in the 
spectroscopic analyses. No light scattering was observed for 
at least 6 h for solutions of SMA at concentrations as high 
as 7 mg/mL (used in the NMR experiments) at various pH 
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Figure 3: Intrinsic tryptophan emission spectra were measured 
with excitation at 280 nm for 0.5 (M protein solution in 50 raM 
buffer and 100 mM NaCl at 37 °C for native SMA at pH 7.5 (♦), 
denatured SMA in 8 M urea at pH 7.5 (O), and Iu at pH 2 (A). 
The wavelength of maximum emission of tryptophan fluorescence 
(panel B) and emission intensity (panel C) are plotted against pH. 
The solid lines are fits to a single ionizable group pH transition 
using eq 1 (see Materials and Methods). The midpoint of pH 
transitions for emission maximum and the intensity at 345 nm were 
5.6 and 3.5, respectively. 

values from 2 to 7 (Figure ID). Tertiary structure changes 
were monitored by tryptophan fluorescence emission, near- 
UV CD, far-UV CD (via the 230 nm peak resulting from 
aromatic clustering), and by ANS binding studies. Secondary 
structure changes were monitored by far-UV CD and Fourier 
transform infrared spectroscopy. 

The two tryptophan residues of SMA, W35 and W50, 
provided convenient spectroscopic means for assessing the 
protein's conformational state. In particular, W35, which was 
quenched by the close proximity of the core disulfide formed 
from C23 and C88 in the native state, was observed to 
provide a probe of the global conformational state of the V L 
domain. Unfolding of the protein resulted in a decrease of 
the quenching and a consequent increase in the emission 
intensity of W35 (Figure 3 A, and ref 31). The second 
tryptophan residue, W50, was solvent-exposed in the native 
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state, based on both the crystal structure of LEN and the 
observed A max of 347 nrn in native SMA and LEN. 

The intrinsic tryptophan fluorescence spectra indicated A milx 
of 347 and 355 inn for native and denatured SMA (8 M 
urea), respectively (Figure 3 A). In addition, denatured SMA 
showed a large increase in Trp fluorescence intensity 
compared to that of its native conformation. When the pH 
of a solution of SMA was lowered from 7.5, significant 
changes were observed in the tryptophan fluorescence 
emission properties. In the pH 4—6 region, there was a 
decrease in A n „ K , from 347 to 345 nm (Figure 3B), which 
was attributed to the build-up of a partially folded intermedi- 
ate. For reasons to be discussed below, this intermediate was 
called 1 N (for nativelike intermediate). These data were fit 
to eq 1, and the midpoint of this pH transition was calculated 
to be 5.6. At pH <4, the emission intensity mere : . • ■ 
the A™, does not change further < Figure 3 B.C). The increase 
in emission intensity observed from pM 5 to 2 suggests 
significant disassembly of the hydrophobic core of the 
protein. The midpoint of this transition was pH 3.4 and 
appeared to be cooperative (Figure 3C). The fluorescence 
spectrum of SMA at pH 2 had a fluorescence intensity 
between those of the native and denatured states, and a blue 
shift in the maximum emission to 345 nm. This suggested 
that substantial residual structure was present at pH 2. This 
conformation of SMA, populated below pH 3, was called 
Iu. In addition, the fluorescence spectrum of SMA was 
measured as a function of protein concentration, over she 
0.05—0.5 mg/mL range, to confirm the presence of the two 
intermediates and to demonstrate the lack of association of 
the samples under the experimental conditions. 

The near-UV CD spectrum of SMA contained significant 
contributions from the aromatic (tryptophan, tyrosine, and 
phenylalanine) residues that were sensitive to the tertiary 
structure of the protein. The near-UV CD spectrum of native 
SMA (Figure 4A) showed positive peaks at 286 and 296 
nm. These likely reflected contributions from two aromatic 
clusters involving residues Y36. Y86, Y87, F98 and 
Y27(d), Y32, Y49, Y91. Y92, observed in the crystal 
structure of LEN which would also be expected to be present 
in SMA. The peaks at 286 and 296 nm disappeared with 
transition midpoints of pH 3.2 and 3.4, respectively (Figure 
4B). These transitions correspond to the formation of the Iu 
species, suggesting loss of the nativelike environment of the 
aromatic groups in this intermediate. The small positive 
ellipticity for SMA at 268 nm showed transitions with 
midpoints of 4.9 and 3.7, corresponding to the transitions to 
I N and Iu, respectively (Figure 4B). At pH 2. the near-UV 
CD spectrum of SM \ i . i 1 ■ < _ ■ c 

4A), suggesting loss of most of the tertiary structure in I t ,r, 
including the aromatic clusters. As a whole, the near-UV 
CD spectra for SMA in the pH 4—6 region indicated that 
the underlying tertiary structure is still relatively nativelike 
in this pH range, consistent with the presence of a nativelike 
conformation in I N . 

The hydrophobic dye ANS has frequently been used as a 
probe to reveal the presence of partially folded intermediates 
due to the presence of increased exposure of contiguous 
hydrophobic surface area (22—24). ANS did not significantly 
bind to SMA in its native state, indicating the absence of 
exposed hydrophobic pockets. However, a pH-titration of 
SMA in the presence of ANS at 37 °C revealed a marked 
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FIGURE 4: Near-UV CD spectra for SMA. Panel A shows plots of 
molar ellipticity for pH 7.5 (•), pH 5 (□), and pH 2 (A). Panel B 
shows the pH dependence of the molar ellipticity at 296 nm (V), 
287 nm (O), and 268 nm (□). The lines through the data are fits to 
eq 1 for peaks at 296 and 287 nm, yielding apparent p£ a s of 3.2 
and 3.4, respectively, which correspond to formation of Iu- The 
data at 268 nm are fitted to eq 2, resulting in two apparent pK^ 
values of pH 4.9 and 3.7. Protein concentration was 0.7 mg/mL. 

increase in the fluorescence emission and a blue shift in the 
ANS emission maximum from 510 to 480 nm. Both the 
emission intensity increase (data not shown) and the blue 
shift of the emission were indicative of exposed hydrophobic 
regions in the vicinity of pH 4—6. with a maximum at pH 
4.5 (Figure 5). Such an observation was taken to indicate 
the build-up of a partially folded intermediate. I N . The 
midpoints of the transitions observed for SMA were at pH 
5.2 and 3.8 (Figure 5). The limited ANS binding at low pH 
was attributed to the second intermediate, Iu, which appears 
to have less contiguous exposed hydrophobic regions. The 
data suggest that the I N intermediate is maximally populated 
between pFl 4 and 5 for SMA. In contrast, no ANS binding 
was observed in the pH 2—10 range for the nonamy- 
loidogenic homologue LEN, indicating the absence of both 
intermediates with LEN (Figure 5). In addition, the structure 
of LEN at pH 2 is nativelike, based on small-angle X-ray 
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Figure 5: pH dependence of binding of ANS to SMA (•) and 
LEN (O). The reaction was monitored with 0.5 protein solution 
and 10 fiM ANS, with excitation at 380 nm. Fluorescence emission 
spectra were collected between 400 and 600 nm at different pHs. 
The solid line is a fit to two transitions using eq 2 (see Materials 
and Methods). The midpoints of the pH transitions were 5.2 and 
3.8, respectively. 



scattering, circular dichroism, FT1R, and Trp flue 
(unpublished observations). 

Circular dichroism spectra were collected for SMA from 
pH 8 to 2 to probe global secondary structural changes in 
the different conformers. The far-UV circular dichroism 
spectrum of native SMA at pH 7.5 was rather unusual, in 
that it had distinct minima at 230 nm, as well as at 216 nm 
(Figure 6A). The former was attributed to contributions from 
aromatic interactions and possibly the disulfid the lal rto 
/3-structure. When the CD spectrum of SMA was examined 
as a function of pH at 37 °C, there were relatively small 
changes between pH 7.5 and pH 4, with more significant 
changes occurring at lower pH (Figure 6). The former are 
consistent with loss of some tertiary structure in In- The 
spectrum at pH 2 was significantly different from the 
spectrum of SMA denatured in 7 M urea or 5 M Gdn-HCl 
at pH 7.4, indicating significant structure at pH 2. Plots of 
the ellipticity at 230 nm against pH reveal the population of 
a distinct conformational species in the pH 4—6 region 
(Figure 6B). The elHpiieity of SMA monitored at 216 nm 
(corresponding to /J-sheet structure) indicated no change 
between pH 7,5 and 4.0, suggesting that Fn is relatively 
nativelike. However, below pH 4.0, the negative ellipticity 
at 2 1 6 nm shifted toward lower wavelengths, consistent with 
the loss of /3-sheet structure (Figure 6C). The spectrum at 
pH 2.0 was a mixture of conn ti I 
random coil conformations, indicating some loss of nativelike 
/3-structure at pH 2. The data were consistent with the 
presence of a relatively unfolded-Iike intermediate, \ v , at pH 
below 3. 

Fourier transform infrared spectroscopy (FT1R) has been 
used to probe protein structure, and the amide 1 band ( 1600— 
1700 cm"" 1 ) has been used to estimate protein secondary 
structure content (25). The ATR-FT1R spectra of hydrated 
thin films of SMA at pH 7.5 and 5 revealed that significant 
secondary structural changes occur for the I N intermediate 
compared to the native state (Figure 7). The major differences 
are an increase in low-frequency /3-sheet (1625 cm -1 ), an 
n disordered structure (1648 cm"" 1 ), increased turn 




Figure 6: Far-UV CD spectra of SMA as a function of pH. Panel 
A shows the spectra at pH 7.5 (•), pH 5 (□), and pH 2 (A). The 
changes in molar ellipticity at 230 nm and at 216 nm are plotted 
against pH in panels B and C, respectively. The solid line in panel 
B is a fit to eq 2 and gives apparent pAT a s of 6.3 and 3.7. The solid 
line in panel C is a fit to eq 1 and gives an apparent pK a of 3.5. 

(1672 cm -1 ), and a small decrease in the 1695 cm -1 
/^-component. 

At pH 2.0, the amide I spectrum of lu is different from 
that at pH 7.5 or 5.0, indicating that lu has a different 
secondary structure from native and I N . The major changes 
are a large increase in the looplike structure at 1660 cm -! 
and loss of the major /3-peak at 1638 cm -1 in the native 
spectrum, which is replaced by a new, lower intensity 
^-component at 1631 cm' 1 . 

'H NMR spectra were collected to further assess the 
conformational changes that took place in SMA at intermedi- 
ate and low pH. As shown in Figure 8A, the NMR spectrum 
of SMA at pH 7.0 is characteristic of a lightly folded protein, 
having well-dispersed amide, aromatic, and aliphatic proton 
resonances. As the pH was reduced to below pl i 5 (Figure 
8B), only minor changes in the spectra were observed. These 
changes included both sharpening of many resonance lines 
as well as changes in some amide proton chemical shifts. 
However, the spectrum was still characteristic of a well- 
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Figure 7: Amide I region of the FTIR spectrum of SMA. Panel A 
shows the spectrum for native SMA at pH 7.5. Panel B shows the 
spectrum of the nativelike intermediate (I N ) at pH 5.0, and panel C 
shows the spectrum of the unfolded-like intermediate (Iu) at pH 
2.0. The raw ATR-FTIR spectra after water vapor subtraction are 
shown as thick lines. The thin lines in each panel are the component 
spectra obtained after curve-fitting the raw spectra. 

folded protein. When the pH was adjusted to below 3, 
however, the chemical shift dispersion was lost, with the 
amide proton resonances collapsing to an envelope less than 
1,5 ppm wide (Figure 8C). The upfield methyl resonances 
were also lost below pH 3, with both the aromatic and 
aliphatic proton regions of the spectrum showing consider- 
able loss of dispersion. Notably, no resonances corresponding 
to either of the tryptophan indole moieties were apparent in 
the low-pH spectra. Additionally, the spectrum at pH 2.0 
was considerably different from that recorded for SMA under 
strongly denaturing conditions (pH 2, 8 M trea) hown 
in Figure 8D. Under these strongly denaturing conditions, a 
further loss of dispersion occurs throughout the spectrum, 
and significant changes in the chemical shifts of nearly all 
of the amide protons occur. As the solution pH is lowered, 
the NiVIR spectra show an increase in signal-to-noise for the 
same concentration of protein. This is likely due to dissocia- 
tion of the V L dimer (A" d = 40 ftM at pH 7) at lower pH. 

Small- IngL: .X-ray Scattering Characterization of SMA 
Conformations. Small-angle X-ray scattering measurements 
indicated that SMA became less compact as the pH was 
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Figure 8: 'H NMR spectra of SMA. Panel A shows the spectrum 
of the native protein at pH 7. Panel B shows the spectrum of the 
nativelike intermediate, I N , at pH 5, and panel C the spectrum of 
the relatively unfolded intermediate, Iu, at pH 2. For comparison, 
the spectrum of the unfolded protein in 8 M urea, pH 2, is shown 
in panel D. 

reduced from 7 to 2 (at a protein concentration of 80 «M), 
with an increase in R„ from 19.6 ± 0.4 A at pH 7 to 26.8 ± 
0.6 A at pH 2. At pH 5, the protein was only slightly- 
expanded, having a R s of 20.5 ± 0.6 A. Kratky plots of the 
scattering data indicated that extensive globularity was 
maintained even at low pH, although some denaturation was 
apparent at pH 5 and more so at pH 2. The significant 
compactness of lu (R s — 26.8 A) is apparent by comparison 
of its R % to that of the fully unfolded protein (R„ > 30 A). 

1 The stabil- 
ity of SMA was measured at different pHs using urea 
denaturation monitored by intrinsic tryptophan fluorescence 
and far-UV CD. Tryptophan fluorescence had the advantage 
that it permitted the use of low protein concentrations, which 
eliminated potential aggregation problems during unfolding. 
At pH 7.5, SMA is only marginally stable, with a free energy 
of unfolding AG° = 4.8 kcal/mol and m = 1.05 kcal/mol. 
Similar equilibrium plots were obtained when starting with 
either native or unfolded protein, indicating that there was 
no hysteresis in the urea unfolding transitions. As the pH 
was decreased, the stability of SMA decreased significantly 
(Table 1) as indicated by the decrease in the midpoint of 
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the urea unfolding transition with decrease in pH. In light 
of the evidence for formation of a non-native species from 
SMA in the vicinity of pH 4-6 (J N ), the urea unfolding data 
were not converted into free energy data except for neutral 
pH. 

Kinetics of . f he rates of 

interconversion between the native conformation (N) and the 
partially folded intermediates On and lu) were monitored by 
pH jumps. Interconversions between N and 1 N were moni- 
tored using changes in the ANS fluorescence and jumping 
the pH of solutions of SMA from 7 to 5 (N to f N ) and from 
5 to 7 (In to N). Interconversions involving I u were followed 
by observing changes in the tryptophan fluorescence, with 
pH jumps of 7 to 2 (N to lu), 5 to 2 (I N to l,j), 2 to 5 to 
In), and 2 to 7 (lu to N). The results from these experiments 
show that conversion of N to In and I N to N are fast 
processes, complete within the dead time of m anual mixing, 
indicating that the rate constant is >0.35 s _i . On the other 
hand, conversion of either N or I N to lu and the reverse are 
slower processes. The rates for both N to l u and In to I i; 
were the same within experimental error, namely. 0.01 ± 
0.002 s _i , consistent with In lying on the pathway between 
N and lu- The rates of the transformation of l, ; to either In 
orN were the same, with a rate constant of 0.002 ± 0.0008 
s _l , again consistent with In lying on the pathway between 
N and lu. 

DISCUSSION 

There is increasing evidence to suggest that protein 
aggregation, including amyloid fibril formation, arises from 
a partially folded conformation of the aggregating protein 
(5, 6. 26-30). The present data strongly support such a 
hypothesis for the aggregation of immunoglobulin light chain 
variable domains. Protein aggregation has generally been 
regarded as being driven by nonspecific hydrophobic interac- 
tions operating on unfolded or collapsed molten-globule 
states. On the other hand, extracellular aggregation • a ■ ... ■ 
proteins to form amyloid fibrils has been conventionally 
attributed to mutations altering the local surface properties 
of the native state, thereby introducing new packing interac- 
tions for oligomerization of the native state (26. 31). In the 
case of SMA. our results clearly indicate that a nativelike 
conformation in the fibrils is highly unlikely. Thus, the model 
for light chain amyloid fibrils proposed by Stevens and co- 
workers (31), and based on the native structure, is incon- 
sistent with the experimental observations. 

The partially folded conformations that are the critical 
precursors to protein aggregation may arise either during the 
folding of newly synthesized proteins, as with inclusion 
bodies, or from the native state, as appears plausible for the 
extracellular amyloid deposits. The build-up of the soluble 



precursor that triggers aggregation could involve a combina- 
tion of factors including an amino acid sequence leading to 
a relatively less stable native state as compared to that of a 
nonaggregating variant and/or the presence of mildly desta- 
bilizing extrinsic conditions. 

Detailed analysis of the properties of the amyloidogenic 
SMA, including its stability, conditions necessary to populate 
partially folded intermediate conformations, propensity to 
aggregate, and kinetics of aggregation and fibril formation, 
provides insight into the molecular basis for aggregation. The 
present results raise a number of interesting questions, such 
as: Which features are responsible for the propensity to form 
fibrils? Why are both amorphous and fibrillar aggregates 
formed, and what is the relationship, if any, between them? 
How do the two partially folded intermediate conformations, 
In and lu, fit into the picture? We will begin with this last 
question. 

Destabilizing Comfit hw lead to Partially Folded Inter- 
mediate Conformations. Mildly destabilizing conditions, such 
as low pH or low urea concentrations (data not shown), lead 
to enhanced aggregation and fibrillation of SMA. The results 
of the spectroscopic investigations of SM A as a function of 
pH reveal that SMA forms two partially folded conforma- 
tions. In and lu, the former being relatively nativelike in its 
structural properties, whereas the latter is considerably more 
unfolded. The major significance of the observation of these 
species is the correlation between formation of amorphous 
aggregates and 1 N , and formation of fibrils and I L1 . Both I N 
and lu are envisaged as ensembles of conformations that 
retain some nativelike structure (more in I N and less in lu) 
with the remainder of the chain, especially for lu, being 
highly mobile, and disordered but biased toward its native 
conformation. 

The near- and far-UV circular dichroism, Trp fluorescence, 
NMR, and SAXS data all point to I N as being a relatively 
nativelike species with most structural properties similar to 
those of the native state. The significant increase in ANS 
binding, however, points to the critical feature of this 
intermediate, namely, increased exposure of hydrophobic 
surfaces compared to the native state. The increased negative 
ellipticity in the far-UV CD at 230 nm is related to this as 
it probably represents minor structural rearrangements in 
.!.■-! iii packing manifested as changes in the CD contri- 
bution of aromatic residues, rather than secondary structure 
changes. The enhanced ANS binding in the pH 4—6 range 
is very consistent with the population of a partially folded 
intermediate (22. 23). Likewise, the FTIR spectrum for I N 
reveals that although the overall secondary structure is quite 
similar to that of the native state, nevertheless there are 
significant structural differences. These include increased turn 
structure and a shift in some of the ^-structure to components 
with a lower frequency band, perhaps signifying changes in 
the jfj-strand interactions. From examination of the spectral 
probes as a function of pH, it is apparent that there is a 
structural transition with a midpoint around pll 5.5. This 
transition, which corresponds to the interconversion of the 
native state to 1 N , could reflect the titration of histidine or 
carboxylase residues in SMA. The pH-dependent transition 
from N to I N was not observed for the nonamyioidogenic 
LEN (Figure 5 and unpublished observations). The only 
differences in ionizable side chains between LEN and SMA 
are two histidines present in SMA (8); this suggests that one 
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or both of these His residues, either directly or indirectly, 
may be responsible for the transitions between N and I N . 

The data indicate that SMA forms a second, more 
unfolded, intermediate, I U; at pH <3. From comparison of 
the spectral probe signals as a function of pH, a common 
transition, attributed to that of 1 N to h.;, with a midpoint in 
the vicinity of pi t 3.3 is observed with all the probes. This 
intermediate retains substantial compactness, and secondary 
structure, consistent with the presence of a partially folded 
intermediate conformation. Based on the apparent pK of the 
transition between I N and lu, this conformational change is 
apparently governed by the titration of carboxylate groups. 
The fact that at pH 2 the fluorescence signal reflects 
substantial residual structure (sign fi * ! _ tsed intensity 
and blue-shifted A max relative to the unfolded and native 
states) suggests that there is also tertiary structure present- 
in Iu. However, the near-UV CD spectrum indicates the loss 
of most of the native aromatic side-chain interactions, 
ng that the aromatic clusters present in the native 
conformation may no longer be present. The FTIR spectra 
indicate additional loop/disordered structure in Iu compared 
to the native and In conformations and also loss of nativelike 
/^-structure. The NMR spectrum of this intermediate also 
clearly indicates that it retains considerable secondary and 
tertiary structure. 

Among the factors that stimulate fibril formation from 
SMA are increased protein concentration, agitation, and 
increased temperature. All these are likely to result in 
increased concentration of non-native con!. • iations(1 
equilibrium between native and non-native states, denatur- 
ation at air— water interfaces, and shifting of the equilibrium 
from native to non-native conformations, respectively), with 
their known propensity to aggregate. These observations 
strengthen the correlation between aggregation and precursor 
partially folded intermediates. 

A minor complication in some of these experiments arises 
from the fact that Vi. domains are known to dimerize in a 
fashion similar to the association of the Q/Vl domain 
interaction in intact immunoglobulins. Stevens and Schiffer 
(32) demonstrated that native SMA exists in a monomer- 
dimer equilibrium with A" d = 40 uM under physiological 
conditions. Thus, data collected with high concentrations of 
SMA. such as SAXS and NMR, will be potentially compli- 
cated by the presence of these native dimers, at least at 
neutral pH. The main anticipated effect would be that under 
such conditions the equilibrium between the native confor- 
mation and the intermediates would be shifted in favor of 
the native conformation, rather than a non-native one. This 
leads to a potential decrease in . the rate of fibril formation 
under conditions where the protein is present mostly as the 
native d inter (unpublished observations). 

Relation between Stability and Aggregation. The stability 
of SMA is greatly decreased at acidic pH, correlating with 
both increased amorphous and fibrillar aggregation. The fact 
that at lower p I I SMA readily forms fibrils suggests that 
removal of nativelike interactions is important prior to 
fibrillation. Our data show that the aggregation of the 
amyloidogenic SMA correlates well with the decreased 
stability of the native state and the population of non-native 
conformations. We believe that it is the differential desta- 
bilization of the native state of SMA relative to the partially 
folded intermediate conformations which is the key feature 
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of the amyioidogenesis, and which will be attenuated in 
nonamyloidogenic light chains. These observations are in 
accord with previous investigations of the correlation be- 
tween amyioidogenesis and stability in light chains, which 
has indicated that either destabilizing mutations or sequences 
(12, 33. 34), or destabilizing conditions (35) correlate with 
increased fibii v 1 V has been shown to 

be about 2.5 kcal/molecule less stable than its benign LEN 
homologue (8). Further, it has been shown that destabilizing 
LEN with Gdn-HCl leads to fibrillation (12). 

Amorphou ' ties I he observation that 

both fibril and amorphous aggregate formation occur simul- 
taneously under many conditions raises a number of ques- 
tions, for example: Do they arise from similar or different 
partially folded conformations, and do amorphous aggregates 
convert directly or indirectly to fibrils? 

The coincident kinetics for light scattering and TFT at pH 
<3, in conjunction with the limited amount of amorphous 
aggregates observed by EM, indicate that at these low pHs 
' nil: arc preferentially formed, and that the limited amount 
of amorphous aggregates formed under these conditions 
rapidly converts to fibrillar species. Since the spectroscopic 
results suggest that at pH <3 the only significant species 
present is Iu, the limited amount of amorphous aggregates 
indicates that it is Iu that is responsible for fibril formation. 
Similarly, the fact that the pH at which maximal amounts 
of amorphous material is found is in the vicinity of 5.5 
indicates that In is primarily responsible for the amorphous 
aggregates. The decreased amplitude of the final TFT signal 
at higher values of pH is attributed to the increased amount 
of amorphous aggregate at higher pHs. 

The kinetics of interconversion between N, I N -, and hi are 
consistent with 1 N being on the pathway between N and iu 
(Scheme I). However, the possibility that there are separate 
pathways to I N and lu (Scheme 2) cannot be eliminated at 
this time. The fast interchange between N and In is not 
surprising, given the fact that I N is relatively nativelike. 

The decreased amounts of fibrils observed at higher pH 
values presumably result from the smaller amounts of lu in 
equilibrium with N and l N (even though the equilibrium 
levels of lu may be quite low at higher pHs, any Iu lost in 
fibril formation will be replaced by mass action with more 
soluble lu). The strong correlation between the pH depen- 
i i 'mi I formation) and the population 
of the partially folded intermediates supports the hypothesis 
that the observed intermediates are key players in the 
tggregation p t Fibril generation, even under nativelike 
conditions, can be attributed to the Boltzmann distribution 
of ensembles of various states under nativelike conditions. 
Hence, it possible that the key intermediate, highly populated 
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at pH <3, is also present tinder nativelike conditions, but at 
substantially lower concentration, and is responsible for 
amyloid formation after an extended lag period. 

Aggregation results from the strong self-association ten- 
dency of the partially folded intermediates, probably due to 
the presence of large solvent-exposed hydrophobic patches, 
which are absent in the native and fully unfolded states. The 
increased /^-structure observed in the aggregated states 
reflects /3-strand— /?-strand interactions involved in the 
intermotecular i >< al 1 1 Fibril formation is expected to 
involve a number of intermediate states of soluble oligomers 
of partially folded intermediates, potentially populated at very 
low levels. Aggregation occurs under conditions in which a 
suitably high concentration of the key partially folded 
intermediate is present, due to a combination of destabilizing 
factors, such as pH and temperature, or urea, and amino acid 
sequence, as well as the concentration of the intermediate. 
Thus, it is mostly the intrinsically low stability of SMA that 
leads to the build-up of the intermediate leading to aggrega- 
tion, under conditions where more stable light chains form 
negligible intermediate and remain in the native conforma- 
tion. 

The much more rapid formation of amorphous aggregates 
of SMA, compared to fibrils, is i I i>v 

a number of questions regarding the nature of the relationship 
between the initially formed amorphous aggregates and the 
more slowly formed fibrils. Based on the data reported here, 

.p via i | ti i 1 I i i > 
in the pH 4—6 region is the direct precursor of the amorphous 
aggregates. The correlation between In and amorphous 
aggregates, and Iu and fibrillar aggregates, suggests that the 
ratio of the two types of deposits is determined, at least in 
part, by kinetic competition between the pathways leading 
to the two different intermediates. A more detailed investiga- 
tion of the relationship between amorphous and fibrillar 
deposits will be given elsewhere. 

The results of the present investigation firmly establish 
the existence of partially folded intermediates as key precur- 
sors on the aggregation pathway of the amyloidogenic light 
chain variable domain SMA. In addition, the observation of 
two such intermediates is the first report that a given protein 
might have more than one critical intermediate conformation 
on the aggregation pathway, and that such different confor- 
mations may lead to different types of deposits. One 
implication of this is that factors, such as chaperones, which 
may change the effective concentration of one of the 
intermediates may change the nature of the deposits. 
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Introduction 

Light chain-associated (AL) amyloidosis is 
characterized by the deposition in tissue of 
monoclonal light chain-related components - a 
pathologic process that leads to organ failure and eventual 
death. The lack of information on the etiology of this disease 
and the fact that there is no effective treatment to prevent 
or remove the abnormal tissue deposits represent major 
scientific challenges, the solutions of which may be relevant 
to other amyloid-associated disorders. 

The purpose of this review is to summarize current 
research efforts that are directed towards elucidation of the 
protein and host factors thought to be responsible for the 
induction and perpetuation of AL amyloidosis. The insights 
that have emanated from these studies and the availability 
of new technologies, especially in structural and molecular 
biology, form the basis of future research. The ultimate goal 
is to apply this information clinically through design of 
novel therapeutic strategies that will be used to overcome 
the devastating impact of this disease. 



Protein factors in the pathogenesis ofAL amyloidosis 
Over 20 years have passed since Osserman and 
colleagues documented the almost invariable presence of 
serum or urinary monoclonal immunoglobulins (Igs) in 
patients diagnosed with what was previously termed 
primary amyloidosis 1 - 2 . The prediction that these molecules 
were involved in the pathogenesis of this disease 2 was 
confirmed when Glenner and co-workers (and subsequently, 
many other investigators) demonstrated that the amyloid 
deposits occurring in such patients were fibrillar in nature 
and composed of monoclonal light chains or, more 
commonly, light chain variable-region (V L ) fragments 3 . The 
unequivocal relationship between secreted and deposited 
light chains has since been established through comparative 
sequence analyses of Bence Jones proteins and amyloid 
proteins obtained from individual patients 3 4 . 

Historically, research on AL amyloidosis has been 
directed mainly towards characterizing the primary 
structural features of monoclonal light chain amyloid- 
associated proteins - i.e., those extracted from amyloid 
deposits or excreted in the form of Bence Jones protein, as 
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well as molecules encoded by DNA cloned from bone 
marrow-derived monoclonal plasma-cell populations. The 
goal of these studies has been to determine if particular 
chemical features of the light polypeptide chain render it 
amyloidogenic. A voluminous body of sequence data has 
been generated that describes apparently novel amino acid 
substitutions in the V L of such light chains 5 . Additionally, 
in some cases, post-translational modification (e.g., 
glycosylation) has been thought to account for this 
phenomenon 6 . However, given the rarity of carbohydrate- 
containing light chains and the fact that these components 
do not invariably form amyloid deposits, the functional or 
pathophysiological effects of glycosylation or other post- 
translational modifications (deamidation, etc.) are not 
known. 

Light chain primary structure 

In an effort to differentiate amyloid proteins from those 
considered non-amyloidogenic, the primary structures of 
representative components have been compared. 
Unfortunately, these studies are confounded by the fact that 
while there is a relatively large referenced database on the 
former, the number of documented benign, "non-amyloid," 
or non-nephrotoxic light chains (Bence Jones proteins that 
are not associated with myeloma [cast] nephropathy and 
light chain deposition disease) is markedly limited. Further, 
those proteins designated "non-amyloidogenic" may, in fact, 
represent amyloid-associated light chains derived from 
patients with unrecognized disease. 

Another factor that complicates comparative analyses 
results from the variability inherent in light chain structure: 



First, there are two types of light chains, K and X, each 
having distinctive V L (and constant [CJ) domains 5 - 7 . The 
V L portion of the molecule is the product of two exons - V 
and joining (J) - that encode, respectively, the first -95 and 
following -13 residues. The V K and W k domains are 
characterized by three hypervariable or complementarity- 
determining regions (CDRs) and four framework regions 
(FRs) that are involved, respectively, in the antigen-binding 
site and in maintaining the structural integrity of the 
molecule (Figure 1). Diversity in sequence results from the 
presence of multiple, functional V K - and V x - (-35 each) 
and J K - and J x - (5 and 4, respectively) germline genes, as 
well as the recombinatorial process that links the V and J 
segments (variation in the length of CDR3 at the V-J joint 
can also result from the presence at position 96 of non- 
template-encoded, extra residues that are the products of N 
or P nucleotides) 8- ' 3 . 

Allelic differences can also account for diversity in 
primary structure, but of greater importance is the 
pronounced variability that is introduced in light chain- 
encoded germline sequences by somatic mutation. As yet, 
there has been no report of an amyloid associated light chain 
that is completely identical in sequence to that encoded by 
a V K - or V^-germline gene; however, the contributory role 
of somatic mutation to light chain amyloid formation 
remains to be established. 

Structural features related to amyloidogenicity 

What are the structural features that account for the 
amyloidogenicity of certain light chains? It is noteworthy 
that in patients with AL amyloidosis X-type monoclonal 




FIGURE 1 . Schematic representation of V L and C L domains and their genetically-encoded segments. The V L is the product ot two gene 
segments, V and J, that encode, respectively, the first -95 residues and the remaining -13 residues of the V region. The C L is the product 
of a single gene, designated C, that encodes for the remaining -106 residues of the light chain. The location of the three hypervariable 
regions or CDRs and of the four FRs are as indicated; the J-encoded portion of the V L encompasses the terminal residue(s) of CDR3 and 
ail the FR4 residues. The wavy lines symbolize the location of additional amino acid residues found within the FRs and CDRs. The residue 
numbering system is according to Reference 5. 
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Igs predominate, in contrast to the prevalence of K-type 
components in other plasma-cell dyscrasias such as multiple 
myeloma, light chain deposition disease and Waldenstrom's 
macroglobulinemia 14 . Such overrepresentation of X chains 
may be attributed to the fact that these components typically 
exist as covalent dimers, in contrast to k molecules that 
occur in the monomeric as well as dimeric form 15 . This 
difference in quaternary structure may reduce the renal 
catabolismof X-type components or, alternatively, increase 
their propensity to form amyloid. Most remarkable has been 
the finding that proteins of one particular VX subgroup - 
V XVI - are invariably associated with this process 16 . Despite 
the relative rarity of this gene family in the normal X-chain 
population (-5%), we have documented that 37% (13 of 
35) of X-type Igs obtained from patients with proven 
amyloidosis were of this isotype' 7 . Among the primary 
structural features that distinguish these proteins are the 
presence of a unique two-residue insertion in FR3 that 
includes an Asp residue at position 66, as well as a Lys 
residue at position 1 7 in FR1. Although the tertiary structural 
features of XVI light chains have not as yet been elucidated, 
the interactive potential of these two oppositely charged 
amino acids may so alter the conformation of these 
molecules as to render them especially amyloidogenic (F. 
J. Stevens, personal communication). 

To date, comparison of sequence data derived from 
amyloid and "non-amyloid" proteins (both X- and K-type) 
has failed to reveal a single, site-specific residue that 
differentiates between these two light chain populations. 
This finding (as well as the apparent irrelevance of 
glycosylation) differs from the case of the 
hemoglobinopathies where, in the case of hemoglobin S, 
for example, the one residue Glu/Val substitution in the B 
chain so alters the molecule that when deoxygenated 
sickling occurs 18 . 

In contrast to the theory that a single amino acid can 
render a light chain amyloidogenic, Stevens et al. 19 proposed 
an alternative mechanism to account for this phenomenon; 
namely, this process is based on a series of interactions 
between V L -associated amino acid residues, first between 
those in monomers leading to the formation of dimers, then 
between dimers to form pre-amyloid filaments and finally, 
between filaments to form fibrils (Figure 2). Because 
multiple positions within the V L participate in fi-domain 
interactions that result in V L -dimer formation and ultimately 
amyloid fibrils (e.g., salt bridges, hydrogen bonds, or van 
der Waals contacts), almost any amino acid substitution 
could theoretically lead to an increase or decrease in the 
propensity of such interactions to occur, thereby modifying 
the tertiary structural features of the light chain and causing 
it, under appropriate physiological conditions, to become 
amyloidogenic. Conversely, particular amino acid 



substitutions could, in fact, inhibit light chain interactions 
and thus render the protein "non-amyloidogenic." The 
model satisfies several experimentally defined features of 
AL amyloid, including fibril diameter and the presence of 
crossed 6 sheets that are aligned with the fibril axis and 
create an intrinsically self-stabilizing assembly. 

This hypothesis was supported by the finding that 
amyloid-associated light chains contained potentially 
interactive, chemically distinct residues located at key 
positions within the V L domain 20 . A position-by-position 
comparison of the frequency of particular residues in 180 
human monoclonal k- and X-type proteins, 52 of which were 
obtained from patients with known amyloidosis, revealed 
statistically significant differences in the CDRs and FRs of 
the two populations. For example, amyloid-associated 
molecules were most often characterized by the presence 
in the CDRs of sterically accessible, charged residues (e.g., 
Asp). Alternatively, in FR2, the typical basic residue found 
at position 45 in non-amyloid proteins was replaced by Asn 
and in FR1 at position 20, amyloid constituents had a 
hydrophobic He residue instead of Thr or Ser. For both K 
and X light chains, the majority of amyloid-associated 
positions were distributed along the surface of the light chain 
involved in the antigen-binding site and, therefore, would 
not participate in the internal packing of the V L domain. 
Further, the side chains of residues at such positions would 
extend into the solvent and be accessible for interaction 
with other molecules in solution. 

These chemical differences that potentially modify light 
chain tertiary structure were also predicted to enhance VL- 
dimer formation and result in protein aggregation. Earlier 
studies, using size-exclusion chromatography, demonstrated 
that, on the basis of polymer formation, Bence Jones 
proteins obtained from patients with amyloidosis or other 
types of pathologic light chain deposits could be 
distinguished from non-toxic components 21 . The 
demonstration that amyloid-associated proteins share 
unique primary structural features and readily aggregate in 
an in vitro system provides further evidence for the three- 
step molecular mechanism by which light chains form 
fibrils, i.e., dimer -> filament -> fibril 19 - 20 . 

Wetzel and colleagues 22 have hypothesized a somewhat 
different theory to explain light chain amyloidogenesis. 
They posited that, while multiple substitutions at key sites 
in the V L are responsible for light chain amyloidogenicity, 
this phenomenon of amyloid fibril formation results because 
such replacements so modify the tertiary structure that the 
protein becomes partially or completely unfolded. It is this 
intermediate form, then, that is the culprit in AL amyloid 
formation (as has been postulated to occur in the 
transthyretin-associated amyloidoses 23 ). Experimental 
support for this theory was obtained using V L fragments 
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FIGURE 2. Computer-based models of V,. interactions leading to the formation of amyloid fibrils. (A) Backbone representation of a V L 
dimer. This basic structure has been found for most Ig light chains which have been studied crystallographically. The shading surrounding 
the "backbone" Indicates the volume occupied by atoms. The portion of the molecule that constitutes the equivalent of an antibody's 
antigen-binding site is located at the lop" of the molecule in this representation; among light chains, most amino acid substitutions occur 
in this area. (B) Hypothetical arrangement of V L dimers during the formation of non-covalent polymers. The antigen-binding site of one 
dimer interacts with the surface of the axially-opposed end of a second dimer following a 90° rotation about the two-fold symmetry axis. 
Many amyloid-forming light chains have amino acids of opposite charge, as highlighted in this figure, located in positions to form high- 
affinity "salt bridges." A consequence of the linking of V L domains Is that any weak "non-specific" affinity of the dimer surface for particular 
tissue surface features, such as proteoglycans, is amplified. Thus, a light chain, which individually would not interact with an organ or 
tissue, might be specifically bound to an organ when polymerized. (C) Interaction of V L -dimer filaments. When viewed from a different 
angle, the filament shown in (B) has a convoluted or "sawtooth" surface that makes it possible to bring together two or more filaments. In 
this figure, an anti-parallel mechanism of bringing filaments together is illustrated; several amino acid positions at which substitutions could 
either enhance or suppress the docking of filaments are highlighted. (D) Space-filling representation of the proposed V L -dimer polymerization 
model. (Figure furnished by Dr. Fred J. Stevens, unpublished data.) 
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derived from DNA recombinant technology. These 
molecules, expressed in a bacterial system, were based on 
the V K dimer REI - a presumably non-amyloidogenic 
piotein, the tertiary structure of which had been elucidated 
by x-ray crystallography 24 . Six V L constructs were prepared 
that contained replacements thought to be unique in K- or 
X,-type amyloid-associated light chains. These 
modifications, due to their locations, were expected to 
destabilize the folded structure of the V L domain. In contrast 
to the wild-type REI component, all six mutants, at low 
pH, formed aggregates and bound Congo red. When 
examined in the presence of guanidine hydrochloride 
(GdnHCl), four of the six were unstable and aggregated to 
produce amyloid-like fibrils. The most destabilizing effect 
occurred when a highly-conserved Arg at position 61 was 
replaced by Asp. This substitution was predicted to eliminate 
or decrease the potential to form a salt bridge/hydrogen bond 
with another Asp residue located at position 82 in an 
adjacent loop. Although the GdnHCl in vitro stability assay 
may provide a means to predict the fibrillogenic capacity 
of a Bence Jones protein, the relevance of this system to in 
vivo amyloid formation remains to be established. 

The aforementioned studies involving comparative 
analyses of light chain sequence have made it increasingly 
evident that such data per se are not sufficient to differentiate 
amyloid from non-amyloid components. In order to define 
more precisely the relationship between particular amino 
acid residues and light chain amyloidogenicity, it is 
necessary to determine how variations in sequence affect 
the overall tertiary structure of the molecule. 

Because amyloid proteins cannot be crystallized, 
computer modeling offers a powerful new tool to predict 
the three-dimensional impact imparted on a protein by 
modification of and interactions between amino acids. 
Additionally, the ability to generate via recombinant 
technology a replenishable source of human light chains in 
which primary structural modification can be induced 
represents an important means to advance our understanding 
of the protein basis of amyloidogenicity 22 .".". This 
technique, as well as computer technology, provides exciting 
new approaches to research in this field, since manipulation 
of key amino acids and, most importantly, visualization of 
resulting interactions should provide information on why 
certain proteins form amyloid. 

Light chain fragmentation in AL amyloidosis 

The structural requirements for the in vivo conversion 
of soluble light chain precursor proteins into insoluble 
amyloid fibrils have not been definitely established - e.g., 
it is unknown if the native protein itself can form amyloid 
fibrils in vivo or if this process is initiated or accelerated by 
light chain fragmentation. In rare instances, the protein 



extracted from AL amyloid deposits was found to be 
composed exclusively of an intact light chain 4 27 . More 
commonly, these extracts consist of light chain fragments 
(V L or V L plus a portion of the C L ) 3 . It has not been 
conclusively established if these components are produced 
de novo 28 29 or result from catabolism or in situ proteolysis 
of intact light polypeptide chains. The fact that we have 
found in virtually all AL amyloid extracts examined a 
certain, albeit small, amount of the complete light chain 
suggests that the common occurrence of fragments 
represents a degradative process. It is noteworthy that, based 
on the tertiary structural features of light chains, as 
evidenced by x-ray crystallography, the C-terminal residues 
of amyloid fragments (in positions 152-154) are typically 
located in sterically exposed regions of the molecule; such 
sites would thus be accessible for endopeptidase digestion. 
We have been able to generate similarly sized fragments 
using various types of endoproteases (e.g., cathepsin D, 
pepsin, or that contained in a lysosomal extract prepared 
from kidney). Additionally, we have found that amyloid 
fragments isolated from different organs from the same 
patient can vary in molecular mass 30 . Although such 
variations may reflect deposition of synthetically-derived 
light chain fragments, it is also possible that light chains 
are deposited in a relatively intact form as amyloid and that 
degradation occurs as mediated by local tissue factors. If 
light chain proteolysis is, indeed, essential for amyloid 
formation, the presence or extent of light chain 
fragmentation could have prognostic or therapeutic 
implications. 

Experimentally, it has been shown that endoprotease 
digestion of certain Bence Jones proteins obtained from 
patients with or without amyloidosis yielded V L fragments 
that had the characteristic features of amyloid. 31 - 34 This 
finding may indicate that the C L portion of the molecule 
interferes with amyloid formation or, as has been shown in 
vitro, is remarkably susceptible to proteolysis as compared 
to the V L domain 33 . The demonstration that regions within 
the light chain are potentially amyloidogenic has come from 
studies of Eulitz and co-workers 36 38 , who found that tryptic 
digestion of certain extensively reduced and alkylated 
amyloid-associated Bence Jones proteins yielded 
precipitates that exhibited green birefringence after Congo 
red-staining and were fibrillar when viewed by polarization 
light microscopy and electron microscopy, respectively. The 
insoluble tryptic digests, when rendered soluble in 
trifluoroacetic acid and subjected to HPLC- 
chromatographic separation on a reversed-phase column, 
were found to contain characteristic 18- to 43-residue V L - 
and C L -related peptides (this discovery was analogous to 
the observation that synthetic 10- to 28-residue peptides 
corresponding to portions of the Alzheimer's disease 
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amyloid-associated 13 protein, the islet amyloid polypeptide 
[IAPP], or prion protein (PrP) could form amyloid fibrils 
in vitro 29 - 41 ). More recent studies in our laboratory have led 
to the identification and synthesis of 12- to 24-mer V L - 
related peptides that are particularly fibrillogenic (Ref. 30 
and Niewold, Th. A. et al., unpublished studies). These 
findings suggest that, under appropriate conditions, all light 
chains are potentially amyloidogenic and that the 
susceptibility of these components to proteolytic cleavage 
may depend on other factors involved in light chain 
catabolism. That certain Bence Jones proteins can 
themselves function as peptidases 44 could also be of 
pathologic significance; however, the relevance of this 
phenomenon to the in vivo production of amyloid remains 
to be established. 

If, indeed, 12- to 24-mer "amyloid" peptides are 
generated in vivo through proteolysis of precursor 
monoclonal light chains, such components may contribute 
to the pathogenesis of AL amyloidosis. Lansbury and 
colleagues 45-48 have proposed that amyloid formation 
requires or is accelerated by a seeding mechanism whereby 
preformed amyloid fibrils serve as a nucleation substance 
that initiates conversion of a soluble precursor protein or 
peptide into amyloid. Three factors were deemed essential 
for a nucleation-dependent polymerization process to occur: 
First, a lag time is necessary before detectable aggregates 
appear; second, there must be a critical concentration of 
the monomer before polymerization occurs; and third, 
polymerization is accelerated by a preformed "seed" or 
nucleus. That this phenomenon may ensue in vivo has been 
evidenced by the rapid formation of amyloid in the 
experimental hamster and mouse AA models 49 50 through 
the administration of "amyloid-enhancing factor" (AEF) - 
a sonicated preparation of an organ that contains either AA- 
or AL-related amyloid fibrils. 

However, in vitro experiments have indicated that 
seeding is protein-specific; i.e., there must be 
complementarity between seed and some portion of the 
precursor protein for growth to occur, as shown in the case 
of 6A4 and prion fibrillogenesis 45 ' 51 - 53 . The demonstration 
that amyloid formation can be accelerated by an apparent 
nucleation-dependent polymerization mechanism provides, 
as suggested by Jarrett and Lansbury 46 , a rational therapeutic 
approach - namely, reduction of precursor protein 
concentration and interference with the generation of the 
nucleus should slow this relentless process. In this regard, 
a r antichymotrypsin and a silicate compound (Na 4 Si0 4 ) 
were found to inhibit BA4 fibrillogenesis 54 ' 55 . Another agent 
- 4'-iodo-4'-deoxydoxorubicin (IDOX), an iodinated 
anthracycline compound was shown to bind with high 
affinity to AL as well as other types of amyloid deposits 
and also interfered with in vitro fibrillogenesis. When 



incubated with AEF, this agent significantly reduced AA 
amyloid formation in an in vivo experimental animal 
system 56 . Further, five of eight patients with AL amyloidosis 
who were treated with this compound had objective 
evidence of amyloid resorption, despite no diminution in 
Bence Jones proteinuria 57 . 

Pathologic heterogeneity of light chain deposits 

Although the three major forms of the human light 
chain-associated diseases (myeloma [cast] nephropathy, 
light chain deposition disease and AL amyloidosis) exist as 
discrete entities 14 58 ' 59 , the coexistence of two distinct forms 
of light chain-associated disease - e.g., light-chain 
deposition disease and amyloidosis - has been noted 59 62 . It 
is not known whether each form results from the presence 
in individual patients of multiple Bence Jones protein 
components, from the potential of a single Bence Jones 
protein to induce more than one type of pathology, or from 
a natural progression of one disease state to another, as 
evidenced by the common involvement of blood vessel 
walls in non-fibrillar and fibrillar forms of light chain 
deposition. 

The fact that a monoclonal light chain can assume 
multiple conformers according to the solvent used for 
crystallization 1963 provides evidence that physiological 
factors can profoundly influence V L dimer interaction. The 
failure of a particular Bence Jones protein to form amyloid 
might be attributed to the greater propensity of the protein 
to aggregate as amorphous casts (myeloma [cast] 
nephropathy) or as punctate, linear deposits (light chain 
deposition disease). Alternatively (as previously discussed), 
particular amino acid substitutions may prevent V L 
aggregation or, if light chain fragmentation is a prerequisite 
for amyloid formation, such substitutions may so modify 
the native conformation of the light chain that protease- 
sensitive sites in the C L domain are rendered inaccessible. 
Furthermore, local factors may change the protein thus 
deposited: Initially the deposits may be non-fibrillar but 
are subsequently modified and assume the B-pleated 
structure typical of amyloid fibrils. 

Host factors in the pathogenesis of AL 
amyloidosis Accessory Molecules 

Interactions between monoclonal light chains and other 
biologically active molecules may also lead to pathologic 
deposition. For example, myeloma (cast) nephropathy has 
been thought to be the product of precipitated complexes 
comprised of Bence Jones protein and a component 
produced by the renal distal tubules - Tamm-Horsfall 
protein 64 " 66 . In the case of AL amyloid, the deposits, in 
addition to light chains, have been shown to contain several 
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chemically diverse substances, including amyloid P 
component 67 , the highly-sulfated glycosaminoglycans 
(GAGs) heparan sulfate and dermatan sulfate 68 " 70 and 
apolipoprotein-E ( Apo-E) 71 . Other types of molecules, such 
as ubiquitin, complement-related components, collagen, 
fibronectin and laminin, have been detected as well. The 
demonstration that certain of these compounds (e.g., Apo- 
E) can accelerate fibril formation in vitro has led to their 
designation as "pathologic chaperones 71 " 73 ." It has been 
proposed that these molecules induce a G-pleated-sheet 
conformation of the precursor protein that, in turn, leads 
eventually to fibril formation. However, we and others"' 34 ' 36 " 
38.74 have demonstrated that isolated light chains and related 
fragments can form amyloid fibrils in vitro in the absence 
of ancillary compounds. Thus, it is possible that the effects 
of these substances are secondary; i.e., binding is non- 
specific and results from the highly adsorptive nature of 
the B-pleated-sheet structure of the amyloid fibril. The 
"pathologic chaperones," which are not found in non- 
amyloid forms of light chain deposition 62 - 72 , may not 
necessarily facilitate fibrillogenesis but, rather, participate 
in other functions that are part of the amyloidogenic process 
- e.g., GAGs may play an important role in selective organ 
localization (see below) of AL amyloid deposits and P 
component has been implicated as a factor that can inhibit 
or prevent the resolution of this material 75 . 

Light chain metabolism 

Essential to the pathogenesis of AL amyloidosis is a 
continuous endogenous source of precursor protein, i.e., 
the monoclonal light chain. However, the amount of protein 
required for this process is unknown. Since the catabolic 
half-life of light chains is normally < 2 h, the concentration 
of these components in serum or urine reflects only a small 
percentage of the protein synthesized by the monoclonal 
plasma-cell population 76 . In contrast to patients with 
multiple myeloma, individuals with AL amyloidosis most 
often have a low percentage of bone marrow plasma cells 
(median 5%) and minimal Bence Jones proteinuria (median 
0.4 g/24 h) 77 . Although there is evidence to suggest that a 
population of monoclonal B cells circulates and serves as 
precursor to the bone marrow-derived plasma cells 78 , it is 
unlikely that there exists a significant extramedullary 
cellular source of monoclonal light chains (except in the 
case of localized AL amyloid deposits 79 ). There is as yet no 
evidence to indicate that the light chain synthetic rate of 
monoclonal plasma cells obtained from patients with AL 
amyloidosis is greater than that of plasma cell populations 
associated with other forms of pathologic light chain 
deposition. The fact that patients with AL amyloidosis can 
have extensive disease despite a seemingly low serum or 
urine concentration of precursor proteins suggests (in the 



absence of an inordinately high rate of synthesis) an 
exceptional propensity of these molecules to form amyloid. 
It is also possible that such patients may have unusually 
high concentrations of light chain binding factors 80 " 82 or a 
molecular defect that limits or prevents the elimination of 
the AL deposits. 

Organ diversity of AL deposits 

Central to the pathophysiology of this disease is the 
remarkable diversity that exists in the organs affected in 
patients with AL amyloidosis 77 - 83 . In some individuals, the 
deposits are confined principally to the kidney and in others 
to the heart, small intestine, or peripheral and/or sympathetic 
nerves, etc. Further, they may be primarily vascular, 
interstitial, or both. Our analyses of protein extracted from 
spleen and other organs have shown no relationship between 
the molecular mass of the deposited material and the 
affected tissue 30 . Whether selective tissue affinity is related 
to specific primary structural features of the light chain that 
result in an interaction with local tissue factors or to an 
antibody-like affinity of certain light chains for particular 
tissue constituents remains to be established. Previous 
attempts to demonstrate organ specificity in vitro using 
fluorescein-labeled native Bence Jones proteins obtained 
from patients with particular organ involvement, although 
showing qualitative differences, were considered 
inconclusive'. 

The demonstration by x-ray crystallography that Bence 
Jones proteins and V L dimers can structurally mimic Fab 
fragments, including the antigen-binding site", implies that 
the selective organ deposition of light chains may represent 
an antibody-ligand interaction. Alternatively, AL amyloid 
deposition may be due to the synthesis of the amyloidogenic 
precursor protein by a local monoclonal plasma-cell 
population. This phenomenon would be analogous to the 
site-specific production of other types of amyloid - e.g., 
ACal, AANF, AIAPP, AfJ - associated with precursor 
proteins (calcitonin, atrial naturetic factor, islet amyloid 
polypeptide, BA4 protein, respectively). Another 
mechanism that may account for the localization of AL 
amyloid deposits within particular organs involves the 
interaction between precursor light chains and tissue- 
specific molecules - e.g., GAGs. It remains to be determined 
if protein structure, organ affinity, or local production of 
light chains, etc. is responsible for the clinical diversity 
found in localized amyloid deposition. 

Pathophysiology of AL amyloidosis 

Other physiologic factors analogous to those implicated 
in myeloma (cast) nephropathy may be involved in the 
pathogenesis of AL amyloidosis . For example, the 
precipitation of Bence Jones proteins in the form of renal 
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tubular casts is accelerated by dehydration or other 
conditions that can adversely affect renal function - e.g., 
hypercalcemia, anemia, or infection 14 . Due to the profound 
effect of solvent composition on light chain tertiary 
structure 63 , it is possible that, under appropriate conditions, 
soluble precursor light chains can be induced to form 
amyloid fibrils. 

Other host (patient)-related elements that could account 
for AL amyloid pathogenesis include the local processing 
by macrophages of soluble Bence Jones protein to form 
amyloid fibrils 84 or the failure of these cells to remove or 
prevent fibrillar light chain deposits. In contrast to the typical 
giant-cell reaction to Bence Jones protein-containing casts 
in myeloma (cast) nephropathy, characteristically, there is 
scant inflammatory response to AL amyloid deposits, 
presumably since this material is weakly or non- 
immunogenic 85 ' 86 . With rare exception, AL amyloid deposits 
are apparently irreversible. The failure of the host to degrade 
AL amyloid fibrils may result from a deficiency in 
macrophage function or lack of a requisite proteolytic 
enzyme (as postulated in the pathogenesis of brain amyloid 
in Alzheimer's disease 87 ' 88 ). The coating of amyloid fibrils 
by "pathologic chaperones" - e.g., P component 75 - may 
also interfere with fibril degradation. Although there is no 
documented difference in the association of such 
components with X- or K-type amyloid fibrils, it has been 
reported that al -antitrypsin was capable of disaggregating 
X- but not K-chain-containing fibrils 89 . 

In Vivo models ofAL amyloidosis 

Further insight into the pathogenesis and, ultimately, 
the effective treatment of AL amyloidosis depends, in part, 
on the availability of in vivo experimental models that 
duplicate the human form of this disease. We have reported 
the results obtained with one such model whereby mice 
were repeatedly injected with different Bence Jones proteins 
obtained from patients with AL amyloidosis. In these 
experiments, the human proteins were deposited in the 
kidney and other organs of the recipient animals in the form 
of amyloid 90 ". The human light chain amyloid also 
contained mouse amyloid P component. Conversely, no 
deposits were found in mice injected under similar 
conditions with a "control," i.e., non-amyloid, Bence Jones 
protein. Due to the relatively large amount of protein needed 
for injection and the length of time required for the amyloid 
to form, other models in which human amyloid-associated 
light chains are produced trangenically or by transfectomas 
may prove more suitable. The development of light chain- 
related amyloid in such in vivo models will provide an 
invaluable means for further research on the pathogenesis, 
treatment and prevention of AL amyloidosis. 
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Abstract 

In our review, we introduce an organizational scheme for membrane 
protein function. It is the relationship between structure, dynamics, 
and environment that endows the membrane and its constituents 
with remarkable sensitivity and robustness. Our understanding be- 
gins with landmark advances like those presented in the following 
chapters. Membrane proteins are notoriously difficult to study, and 
so the work presented here on the ADP/ATP carrier [Nury et al. 
(2)], rhodopsin [Palczewski (24)], and the cytochrome b 6 f complex 
[Cramer et al. (35)] represents incredible progress in this now blos- 
soming field. 
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INTRODUCTION 

Membrane protein structure determination 
is proceeding at an exciting pace, driven by 
the hope that structures can connect decades 
of biochemical and biophysical observations 
to protein function and mechanism. The re- 
views that follow demonstrate that structures 
are capable of resolving long-standing mys- 
teries while suggesting new questions and 
opening new areas of scientific discovery. As 
science as a whole unravels increasingly com- 
plex and finely tuned functionality, can struc- 
ture determination continue to keep pace, or 
will new methodologies and perspectives be 
required? The ultimate test will be to build 
robust molecular-level models capable of pre- 
dicting the functional outcome of any arbi- 
trary structural perturbation. The creation of 
such models will require adopting a broad per- 
spective, one that takes into account dynamic 
deviations from static structures as well as the 
influence of the membrane environment. 

It is helpful to establish a paradigm that 
categorizes and organizes die relevant con- 
tributors to function. Structures should, and 
will, act as the nucleation points for this 
organization. This is because understanding 
mechanisms is ultimately a matter of chem- 
istry, and one cannot invoke chemical ideas 
of the mechanism without knowing where 
the atoms are. As we detail in the discus- 
sion that follows, evolution has exploited three 
categories of molecular-level organization in 
order to achieve efficient and diverse mem- 
brane protein functions: structure, molecu- 



lar dynamics, and environmental constraints. 
As suggested schematically in Figure 1, they 
are inextricably linked, each influencing the 
other, collectively dictating membrane pro- 
tein function. By recasting the structure- 
function relationship in this way, we suggest 
diat a comprehensive view of membrane pro- 
tein function may be more readily achieved. 



STRUCTURE 

Any modern model for predicting function 
must start with structure. As demonstrated in 
the following reviews, static structures can el- 
egantly illuminate the structure-function re- 
lationship. Collectively, the articles detail a 
series of new features only now visible in 
the increasingly high-resolution structures. 
The emergence of these elements, which in- 
clude protein oligomerization and complex- 
ation, lipidation, glycosylation, and die pres- 
ence of structural waters, suggests diat future 
models may need to account for a high level 
of complexity and structural variability. 

Oligomerization is increasingly seen as a 
common motif in membrane proteins (1), and 
dimerization in particular is thought to be 
the rule for each of the three proteins dis- 
cussed below. As the authors point out, dis- 
tinguishing between dimers and higher or- 
der oligomers is far from trivial. Although 
diis difficulty may be primarily due to ex- 
perimental limitations such as dissociation by 
detergent, it may also reflect true biologi- 
cal variability: Perhaps there is a functionally 
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relevant equilibrium distribution of 72-mers 
within cell membranes. This possibility sug- 
gests an investigation into the evolutionary 
origin of oligomerization. We suggest four 
distinct, though not exclusive, explanations. 
First, in most studied cases, oligomerization 
serves a functional role, and this function most 
likely drove its evolution. For example, the 
cytochrome b 6 f dimer interface is thought to 
form an electron transfer bridge for "cross 
talk" between the two monomers. In the case 
of rhodopsin, dimerization is thought to reg- 
ulate G-protein coupling. Second, stability of 
newly evolved proteins must have been criti- 
cal, and oligomerization may provide an effi- 
cient way to select for stabilizing mutants. In 
the case of a homodimer, for example, a sin- 
gle mutation at the interface could be twice 
as efficient in stabilizing the protein com- 
pared with a single mutation in a monomer. 
Similarly, a third explanation is that forma- 
tion of oligomeric structures can augment ge- 
netic efficiency. For example, in the several 
ion channel structures we now know, iden- 
tical subunits surround the ion pathway, re- 
quiring the coding of only a single unit to 
form a larger structure. As a fourth possible 
explanation, we speculate that if dense packing 
of membrane proteins is evolutionarily im- 
portant for optimizing functional output per 
unit area of membrane, then the evolution 
of oligomeric interfaces might have been a 
structural adaptation to support high packing 
density while minimizing energetically unfa- 
vorable protein-protein contacts. Evolution 
would then have builtupon this through adap- 
tations that functionalized the interface. 

Like oligomerization, the presence of 
strongly bound lipids and specific sites for wa- 
ter molecules found in recent high-resolution 
structures suggests a potentially large 
degree of structural, and hence functional, 
variability. What is their role in structural 
stability and what is their functional signifi- 
cance? In the case of b 6 f, structural lipids are 
suggested either to be stabilizers, acting as 
"structural struts," or functional, "imposing 
restraints on protein dynamics." In the case of 



the ADP/ATP carrier, removal of structural 
cardiolipin molecules, as Nury et al. (2) 
point out, leads to a 20% decrease in protein 
activity. How dynamic is the lipid binding? 
Should these lipids be considered as ligands 
or as parts of the structure? In some cases 
it seems that the lipid requirement is not 
specific, but in others it is. Imagine the 
impact if it is found that these interactions 
with lipids are a highly regulated and ubiqui- 
tous cellular phenomenon. Similar questions 
can and should be asked for the fascinating 
case of structural waters, as well as for all 
posttranslational modifications, including 
glycosylation. The potential combinatorial 
explosion because of variable glycosylation 
patterns on the surface of membrane proteins 
underscores our contention, as we now 
discuss, that the size of structure space may 
be staggeringly large if all variations have 
functional relevance. 



WHY DYNAMICS? 

In many cases, a protein must undergo a 
dynamic conformational transition between 
discrete structural states to carry out its func- 
tion. Such transitions, for example from state 
A to state B, involve a change in the thermo- 
dynamic free energy of the system, AGa-*b- 
The ADP/ATP carrier protein located in the 
mitochondrial membrane, discussed by Nury 
etal. (2), is a good example. The authors point 
to the "induced transition fit" (ITF) mecha- 
nism to explain the dynamics of the protein. 
In this mechanism, described by Klingenberg 
(3), die membrane protein exists in multiple, 
discrete conformations, with the metabolite 
only binding perfectly to the highest energy 
state. The energy of this transition state is 
then utilized in triggering further conforma- 
tional changes necessary for metabolite re- 
lease. The total free energy change of such 
a process is the sum over all intermediate 
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with /' = 1 corresponding to state A,i — N — 
1 corresponding to state B, and N commonly 
assumed to be a finite and reasonably small 
number of states. 

Just how many relevant energy states exist 
for any given mechanism? Five such states are 
suggested in the case of die ITF mechanism. 
In the case of ion channels, models built from 
gating behavior suggest an even larger num- 
ber (4). If one diinks in terms of a multidimen- 
sional free energy landscape for transitions in 
proteins, then these discrete states are found 
either in local energy minima (intermediates) 
or maxima (transition states). However, the 
energy wells corresponding to the minima 
may be quite broad, and transitions between 
them may follow multiple paths. Each of these 
paths may, in fact, be populated with an ex- 
tremely large number of functionally perti- 
nent conformational substates. Such an idea is 
supported by the case of calmodulin, wherein 
analysis of disorder in the crystal structure 
has suggested that the protein "may sam- 
ple a quasi-continuous spectrum of confor- 
mations" (5). Similarly, a very large number 
of paths between any two states are theoreti- 
cally possible, although the accessible number 
is likely diminished by environmental condi- 
tions (see below). This all suggests that the 
overall free energy change may also be ex- 
pressed as 

A" -I 

AG A ^ B = J2 AG j-,+i 

N"-l 

= J2 ag 'm+i = ••■■ 

Which path is taken determines the work 
required in effecting a transition and thus has 
important consequences for protein efficiency 
and function. Could path choice be a vari- 
able parameter in cellular control, for exam- 
ple, through concerted variations in the pro- 
teins' surroundings (i.e., the membrane)? 

Clearly, given the relevance of confor- 
mational states to function, single structures 



can only partially describe a mechanism. The 
range of available energy states along a tran- 
sition path may be exploited by proteins 
for tuning function. How do we investigate 
the transition path? Is it possible to predict, 
solely from two static structures, die rele- 
vant path between the two states? The prob- 
lem is especially complicated because struc- 
tural dynamics span a range of frequencies, 
from low (large conformational changes or 
domain movements, more traditionally as- 
sociated with function) to high (side-chain 
rotameric isomerizations, which may, for ex- 
ample, play an important role in electron 
transport proteins, which are odierwise rel- 
atively immobile). There is, however, signifi- 
cant progress being made in the area of com- 
putational biology that portends a numerical 
solution to this problem, although further im- 
provements in the computational represen- 
tation of chemistiy are needed. For exam- 
ple, molecular dynamics simulations generally 
provide information about high-frequency 
fluctuations (6), and normal mode analysis 
can help predict the lower frequency paths 
(7). More sophisticated computational meth- 
ods known as path sampling have been shown 
to yield highly detailed transition path infor- 
mation for simple systems (8-12), and more 
recently for biomolecules (13-16). 

ENVIRONMENT 

As we suggested above, a high degree of com- 
plexity exists owing to structural variability 
and dynamic transition paths. A third factor 
that contributes to complexity is the mem- 
brane environment. Structure determination 
generally requires isolation of the protein 
from die remainder of the biological system. 
However, any proper thermodynamic analy- 
sis must include all relevant components of 
the system and must pay particularly close 
attention to boundaries where energy is ex- 
changed. The contribution of the membrane 
environment therefore deserves considera- 
tion in analysis of function. We have noted the 
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multiplicity of available conformational tran- 
sition paths, and that path choice determines 
the trade-off between energy spent and work 
done. The number of available paths, along 
with the paths themselves, must be altered by 
physical constraints placed on the protein by 
the membrane environment. Therefore, iso- 
lation of the protein from its native membrane 
provides only a partial story. 

The membrane consists of, among other 
things, lipids and a multitude of proteins. In 
addition to severely biasing protein confor- 
mational states and the paths between them 
(17), the membrane itself is capable of stor- 
ing energy through conformational flexibil- 
ity of its own (18, 19). It has been clearly 
established that lipid composition varies pro- 
foundly in different membranes (20). Addi- 
tionally, single membranes can be highly het- 
erogeneous, as in the case of "lipid rafts," 
the functional manifestation of lipid domains 
known for decades to exist in synthetic mix- 
tures (21-23). Sustaining such diversity, for 
example, through variable lipid composition, 
may cost the cell valuable resources, so spe- 
cific functional rationales should be examined. 
Furthermore, biophysical measurements have 
established that physical properties of mem- 
branes, such as curvature elastic stress, pro- 
foundly affect the efficiency of membrane 
proteins. As one landmark example, a spe- 
cific conformational transition in rhodopsin, 
the molecule discussed by Palczewki (24), is 
favored by the highly curved reverse hexag- 
onal phase, rather than the standard lamel- 
lar phase (25). It seems likely that cells might 
take advantage of this type of specificity to 
tune function by genetically regulating their 
membrane-specific constituent lipid popula- 
tions. Other significant membrane properties 



that are now appreciated as influencing struc- 
ture and function of membrane proteins are, 
among others, lateral tension (26, 27), hy- 
drophobic matching (28-32), and electrostat- 
ics (33, 34). By modifying these properties, 
cells have afforded themselves a highly tun- 
able molecular environment and thus, we sug- 
gest, have gained incredibly fine control over 
protein function. 

Molecular function underlies all of bi- 
ology, from the processing of input in the 
simplest organisms to higher consciousness 
in humans. How intricate must the molecu- 
lar machinery be to support such a diverse 
and elegant world? It is, of course, natural 
that we think in terms of discrete states. Pri- 
marily, it makes investigation of structure- 
function more tractable. Additionally, experi- 
mental techniques such as crystallization lend 
themselves to thinking in terms of single, or 
averaged, structures. The perspective we offer 
here poses a challenge to modern biologists 
given the immensity of combinatorial possi- 
bilities it suggests. It is hard to imagine how 
to begin to study this complexity; however, 
it is a challenge worth pursuing. If we seek a 
molecular-level explanation of phenomena as 
mystifying as consciousness, then our sugges- 
tion of near infinite combinatorial complex- 
ity seems less of a stretch. Clearly, the mere 
existence of complexity does not prove its evo- 
lutionary or functional significance. The im- 
portant tasks are to digest the complexity and 
find the simplifications that remain true to the 
biology. We suggest that computational meth- 
ods, along with increased experimental reso- 
lution, both spatially and temporally, should 
facilitate this effort. The progress reflected in 
the reviews that follow suggests that we are 
well on our way. 
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Perth, Australia 

2. Doctor of Philosophy (awarded in 2003) 
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Immune System Therapeutics Ltd. 
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Immune System Therapeutics LTD. Sydney Australia. 

Role: Drive projects that increase the intellectual property profile of the company 

3. Senior Post-Doctoral Research Fellow (2003-2006) 
Harvard University, School of Public Health, 

Department of Immunology and Infectious Diseases, Boston USA. 

Role: Perform and supervise research into Malaria vaccine candidates 

4. Laboratory class teaching assistant - Undergraduate Biochemistry (1999-2001) 
La Trobe University, Department of Biochemistry. Melbourne, Australia. 

Role: Supervise undergraduate students in practical classes 



5. Research Assistant, Phillips Laboratory (1997-1999). 

La Trobe University, Department of Biochemistry. Melbourne, Australia. 

Role: Support research into potential anti-cancer compounds 



Page 1 



Dr C Jennings 



Resume 



OTHER ACADEMIC TRAINING: 

1. Environmental Auditor Certification Workshop (July, 2009) 
Thomson - Reuters 

Aim: Training and certification for auditing environmental management systems 

2. Environmental Management Systems Workshop (July, 2009) 
Thomson - Reuters 

Aim: Additional training in environmental management systems and auditing 
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• Received a student bursary to attend the Forth Australian Peptide Conference 
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Several Provisional patent applications written and filed 
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• Biological Sciences and Public Health Retreat (Harvard University) 2003 
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Jennings C.V. Craik D.J., Anderson M.A 



RESEARCH PUBLICATIONS: 
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Int J Parasitol. 2008 Mar;39(4):399-405. 
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Subcell Biochem 2008.:46-57. 
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